2024 Summer of Reproducibility – Another Great Year Promoting Practical Reproducibility

The Summer of Reproducibility (SoR) program successfully matched 22 students with over 30 mentors to work on projects that aim to make practical reproducibility a mainstream part of the scientific process and expose more students to reproducibility best practices. The SoR is managed under the UCSC Open Source Research Experience (OSRE) program, which follows the model set forth by successful mentorship, outreach, and sponsorship programs such as the Google Summer of Code (GSoC).

The SoR mentorship program is open to anyone interested in reproducibility. The process begins with mentors – typically university faculty or researchers – submitting “project ideas” to the OSRE website. Then students from around the world review the project ideas, reach out to the project mentors, discuss the project requirements, and formulate a proposal for a summer project. The chosen students are provided a stipend to work with their mentors over the course of the summer.

2023 was the SoR’s inaugural year and was received positively by both mentors and students. Building on the success from 2023, the SoR program brought in even more mentors and students. The 2024 feedback from both mentors and students was also overwhelmingly positive. The average rating provided by mentors of their mentees was 4.7 out of a possible 5, and all projects were completed successfully. As noted by one mentor: “This program is excellent for providing the opportunity to work with smart, creative, and productive young scientists and engineers…. I also learned a lot from my [mentee]. This project has served as a catalyst for generating several other follow-up projects by finding multiple interesting research problems.” Another mentor noted that “finding funds for remote collaborators is hard and [the SoR] actually is really helpful for my research because it gives incentive to students who work harder while at the same time able to contribute to the community.”

2024 SoR students also felt that their program allowed for collaborative and productive experiences. One student noted that their mentor was “incredibly supportive and always available to help … [the mentor] has been particularly helpful in guiding me through challenges, offering clear instructions, and encouraging me when I’ve felt stuck.” Another student – who said this was their first experience with reproducibility – commented that: “Reproducibility is not a value-free topic. I really appreciated that my mentor valued my opinions and made me feel welcome to contribute my ideas, both at a high level and at the level of my experience. For example, what were the pain points, and how could we translate that experience into publishable results.” These comments were indicative of the feedback received from all the SoR students.

Keep a look out for announcements about SoR 2025 soon!

Mentor recruitment for the 2025 Summer Program begins in December 2024 and student applications will be accepted in late Winter 2025. More details to come on the timeline for next year’s program.

2024 SoR Projects

Blogs and final reports for each student’s project available via their bio links in table

Contributor Name Affiliation Project Title Mentor(s)
Acheme Christopher Acheme Clemson University Optimizing Scientific Data Streaming: Developing Reproducible Benchmarks for High-Speed Memory-to-Memory Data Transfer over SciStream Joaquin Chung, Flavio Castro
Alicia Esquivel Morel University of Missouri – Columbia Enhancing User Experience Reproducibility through TROVI Redesign Kate Keahey, Mark Powers
Archit Dabral Indian Institute of Technology (BHU) Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon Raül Sirvent
Arya Sarkar University of Engineering & Management, Kolkata Reproducibility in Data Visualization: Investigate Solutions for Capturing Visualizations David Koop
Jiajun Mao University of Chicago OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS Anjus George, Meng Wang
Joanna Cheng Johns Hopkins University LAST: ML in Detecting and Addressing System Drift Ray Andrew Sinurat, Sandeep Madireddy
Kilian Warmuth Technical University of Munich SLICES/pos: Reproducible Experiment Workflows Sebastian Gallenmueller, Kate Keahey, Georg Carle
Klaus Krassnitzer Technical University of Vienna AutoAppendix: Enhancing Artifact Evaluation via the Chameleon Cloud Sascha Hunold
Kyrillos Ishak Alexandria University Data leakage in applied ML: Examples from Tractography and Medicine Fraida Fund, Mohamed Saeed
Lihaowen (Jayce) Zhu University of Chicago Benchmarking for Enhanced Feature Engineering and Preprocessing in ML Roy Huang, Swami Sundararaman
Martin Putra University of Chicago Scalable and Reproducible Performance Benchmarking of Genomics Workflows In-kee Kim
Mrigank Pawagi Indian Institute of Science Deriving Performance Benchmarks from Python Applications Ben Greenman
Nicole Brewer Arizona State University ReproNB: Reproducibility of Interactive Notebook Systems Tanu Malik
Peiran Qin University of Chicago FetchPipe: Data Science Pipeline for ML-based Prefetching Haryadi Gunawi
Qianru Zhang University of Waterloo BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking Ziheng Duan
Rafael Sinjunatha Wulangsih Institut Teknologi Bandung (ITB) Environment and VAA Preparation and/or Reproducibility for Dynamic Bandwidth Allocation (CONCIERGE) Yuyang (Roy) Huang, Junchen Jiang
Shaivi Malik Guru Gobind Singh Indraprastha University Data leakage in applied ML: reproducing examples from genomics, medicine and radiology Fraida Fund, Mohamed Saeed
Shuang Liang Ohio State University Understanding and Addressing Scalability Bugs in Large-Scale Distributed Systems Yang Wang, Bogdan ‘Bo’ Stoica
Syed Qasim Boston University ML-Powered Problem Detection in Chameleon Ayse Coskun, Michael Sherman
Triveni Gurram Northern Illinois University Reproducibility in Data Visualization David Koop
William Nixon Bandung Institute of Technology Developing A Comprehensive Pipeline to Benchmark Drift Management Approaches Ray Andrew Sinurat, Sandeep Madireddy
Xikang Song University of Chicago Fail-Slow Algorithm Analysis: Language Model Based Detection of the RAID Slowdown Conditions Kexin Pei, Ruidan Li
Zahra Nabila Maharani Universitas Dian Nuswantoro ScaleRep: Reproducing and benchmarking scalability bugs hiding in cloud systems Bogdan “Bo” Stoica, Yang Wang

Leave a Reply

Your email address will not be published. Required fields are marked *