2024 Summer of Reproducibility – Another Great Year Promoting Practical Reproducibility

The Summer of Reproducibility (SoR) program successfully matched 22 students with over 30 mentors to work on projects that aim to make practical reproducibility a mainstream part of the scientific process and expose more students to reproducibility best practices. The SoR is managed under the UCSC Open Source Research Experience (OSRE) program, which follows the model set forth by successful mentorship, outreach, and sponsorship programs such as the Google Summer of Code (GSoC).

The SoR mentorship program is open to anyone interested in reproducibility. The process begins with mentors – typically university faculty or researchers – submitting “project ideas” to the OSRE website. Then students from around the world review the project ideas, reach out to the project mentors, discuss the project requirements, and formulate a proposal for a summer project. The chosen students are provided a stipend to work with their mentors over the course of the summer.

2023 was the SoR’s inaugural year and was received positively by both mentors and students. Building on the success from 2023, the SoR program brought in even more mentors and students. The 2024 feedback from both mentors and students was also overwhelmingly positive. The average rating provided by mentors of their mentees was 4.7 out of a possible 5, and all projects were completed successfully. As noted by one mentor: “This program is excellent for providing the opportunity to work with smart, creative, and productive young scientists and engineers…. I also learned a lot from my [mentee]. This project has served as a catalyst for generating several other follow-up projects by finding multiple interesting research problems.” Another mentor noted that “finding funds for remote collaborators is hard and [the SoR] actually is really helpful for my research because it gives incentive to students who work harder while at the same time able to contribute to the community.”

2024 SoR students also felt that their program allowed for collaborative and productive experiences. One student noted that their mentor was “incredibly supportive and always available to help … [the mentor] has been particularly helpful in guiding me through challenges, offering clear instructions, and encouraging me when I’ve felt stuck.” Another student – who said this was their first experience with reproducibility – commented that: “Reproducibility is not a value-free topic. I really appreciated that my mentor valued my opinions and made me feel welcome to contribute my ideas, both at a high level and at the level of my experience. For example, what were the pain points, and how could we translate that experience into publishable results.” These comments were indicative of the feedback received from all the SoR students.

Keep a look out for announcements about SoR 2025 soon!

Mentor recruitment for the 2025 Summer Program begins in December 2024 and student applications will be accepted in late Winter 2025. More details to come on the timeline for next year’s program.

2024 SoR Projects

Blogs and final reports for each student’s project available via their bio links in table

Contributor Name	Affiliation	Project Title	Mentor(s)
Acheme Christopher Acheme	Clemson University	Optimizing Scientific Data Streaming: Developing Reproducible Benchmarks for High-Speed Memory-to-Memory Data Transfer over SciStream	Joaquin Chung, Flavio Castro
Alicia Esquivel Morel	University of Missouri – Columbia	Enhancing User Experience Reproducibility through TROVI Redesign	Kate Keahey, Mark Powers
Archit Dabral	Indian Institute of Technology (BHU)	Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon	Raül Sirvent
Arya Sarkar	University of Engineering & Management, Kolkata	Reproducibility in Data Visualization: Investigate Solutions for Capturing Visualizations	David Koop
Jiajun Mao	University of Chicago	OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS	Anjus George, Meng Wang
Joanna Cheng	Johns Hopkins University	LAST: ML in Detecting and Addressing System Drift	Ray Andrew Sinurat, Sandeep Madireddy
Kilian Warmuth	Technical University of Munich	SLICES/pos: Reproducible Experiment Workflows	Sebastian Gallenmueller, Kate Keahey, Georg Carle
Klaus Krassnitzer	Technical University of Vienna	AutoAppendix: Enhancing Artifact Evaluation via the Chameleon Cloud	Sascha Hunold
Kyrillos Ishak	Alexandria University	Data leakage in applied ML: Examples from Tractography and Medicine	Fraida Fund, Mohamed Saeed
Lihaowen (Jayce) Zhu	University of Chicago	Benchmarking for Enhanced Feature Engineering and Preprocessing in ML	Roy Huang, Swami Sundararaman
Martin Putra	University of Chicago	Scalable and Reproducible Performance Benchmarking of Genomics Workflows	In-kee Kim
Mrigank Pawagi	Indian Institute of Science	Deriving Performance Benchmarks from Python Applications	Ben Greenman
Nicole Brewer	Arizona State University	ReproNB: Reproducibility of Interactive Notebook Systems	Tanu Malik
Peiran Qin	University of Chicago	FetchPipe: Data Science Pipeline for ML-based Prefetching	Haryadi Gunawi
Qianru Zhang	University of Waterloo	BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking	Ziheng Duan
Rafael Sinjunatha Wulangsih	Institut Teknologi Bandung (ITB)	Environment and VAA Preparation and/or Reproducibility for Dynamic Bandwidth Allocation (CONCIERGE)	Yuyang (Roy) Huang, Junchen Jiang
Shaivi Malik	Guru Gobind Singh Indraprastha University	Data leakage in applied ML: reproducing examples from genomics, medicine and radiology	Fraida Fund, Mohamed Saeed
Shuang Liang	Ohio State University	Understanding and Addressing Scalability Bugs in Large-Scale Distributed Systems	Yang Wang, Bogdan ‘Bo’ Stoica
Syed Qasim	Boston University	ML-Powered Problem Detection in Chameleon	Ayse Coskun, Michael Sherman
Triveni Gurram	Northern Illinois University	Reproducibility in Data Visualization	David Koop
William Nixon	Bandung Institute of Technology	Developing A Comprehensive Pipeline to Benchmark Drift Management Approaches	Ray Andrew Sinurat, Sandeep Madireddy
Xikang Song	University of Chicago	Fail-Slow Algorithm Analysis: Language Model Based Detection of the RAID Slowdown Conditions	Kexin Pei, Ruidan Li
Zahra Nabila Maharani	Universitas Dian Nuswantoro	ScaleRep: Reproducing and benchmarking scalability bugs hiding in cloud systems	Bogdan “Bo” Stoica, Yang Wang

Keep a look out for announcements about SoR 2025 soon!

2024 SoR Projects

Leave a Reply Cancel reply