The Summer of Reproducibility (SoR) program successfully matched 22 students with over 30 mentors to work on projects that aim to make practical reproducibility a mainstream part of the scientific process and expose more students to reproducibility best practices. The SoR is managed under the UCSC Open Source Research Experience (OSRE) program, which follows the model set forth by successful mentorship, outreach, and sponsorship programs such as the Google Summer of Code (GSoC).
The SoR mentorship program is open to anyone interested in reproducibility. The process begins with mentors – typically university faculty or researchers – submitting “project ideas” to the OSRE website. Then students from around the world review the project ideas, reach out to the project mentors, discuss the project requirements, and formulate a proposal for a summer project. The chosen students are provided a stipend to work with their mentors over the course of the summer.
2023 was the SoR’s inaugural year and was received positively by both mentors and students. Building on the success from 2023, the SoR program brought in even more mentors and students. The 2024 feedback from both mentors and students was also overwhelmingly positive. The average rating provided by mentors of their mentees was 4.7 out of a possible 5, and all projects were completed successfully. As noted by one mentor: “This program is excellent for providing the opportunity to work with smart, creative, and productive young scientists and engineers…. I also learned a lot from my [mentee]. This project has served as a catalyst for generating several other follow-up projects by finding multiple interesting research problems.” Another mentor noted that “finding funds for remote collaborators is hard and [the SoR] actually is really helpful for my research because it gives incentive to students who work harder while at the same time able to contribute to the community.”
2024 SoR students also felt that their program allowed for collaborative and productive experiences. One student noted that their mentor was “incredibly supportive and always available to help … [the mentor] has been particularly helpful in guiding me through challenges, offering clear instructions, and encouraging me when I’ve felt stuck.” Another student – who said this was their first experience with reproducibility – commented that: “Reproducibility is not a value-free topic. I really appreciated that my mentor valued my opinions and made me feel welcome to contribute my ideas, both at a high level and at the level of my experience. For example, what were the pain points, and how could we translate that experience into publishable results.” These comments were indicative of the feedback received from all the SoR students.
Keep a look out for announcements about SoR 2025 soon!
Mentor recruitment for the 2025 Summer Program begins in December 2024 and student applications will be accepted in late Winter 2025. More details to come on the timeline for next year’s program.
2024 SoR Projects
Blogs and final reports for each student’s project available via their bio links in table
Contributor Name | Affiliation | Project Title | Mentor(s) |
---|---|---|---|
Acheme Christopher Acheme | Clemson University | Optimizing Scientific Data Streaming: Developing Reproducible Benchmarks for High-Speed Memory-to-Memory Data Transfer over SciStream | Joaquin Chung, Flavio Castro |
Alicia Esquivel Morel | University of Missouri – Columbia | Enhancing User Experience Reproducibility through TROVI Redesign | Kate Keahey, Mark Powers |
Archit Dabral | Indian Institute of Technology (BHU) | Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon | Raül Sirvent |
Arya Sarkar | University of Engineering & Management, Kolkata | Reproducibility in Data Visualization: Investigate Solutions for Capturing Visualizations | David Koop |
Jiajun Mao | University of Chicago | OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS | Anjus George, Meng Wang |
Joanna Cheng | Johns Hopkins University | LAST: ML in Detecting and Addressing System Drift | Ray Andrew Sinurat, Sandeep Madireddy |
Kilian Warmuth | Technical University of Munich | SLICES/pos: Reproducible Experiment Workflows | Sebastian Gallenmueller, Kate Keahey, Georg Carle |
Klaus Krassnitzer | Technical University of Vienna | AutoAppendix: Enhancing Artifact Evaluation via the Chameleon Cloud | Sascha Hunold |
Kyrillos Ishak | Alexandria University | Data leakage in applied ML: Examples from Tractography and Medicine | Fraida Fund, Mohamed Saeed |
Lihaowen (Jayce) Zhu | University of Chicago | Benchmarking for Enhanced Feature Engineering and Preprocessing in ML | Roy Huang, Swami Sundararaman |
Martin Putra | University of Chicago | Scalable and Reproducible Performance Benchmarking of Genomics Workflows | In-kee Kim |
Mrigank Pawagi | Indian Institute of Science | Deriving Performance Benchmarks from Python Applications | Ben Greenman |
Nicole Brewer | Arizona State University | ReproNB: Reproducibility of Interactive Notebook Systems | Tanu Malik |
Peiran Qin | University of Chicago | FetchPipe: Data Science Pipeline for ML-based Prefetching | Haryadi Gunawi |
Qianru Zhang | University of Waterloo | BenchmarkST: Cross-Platform, Multi-Species Spatial Transcriptomics Gene Imputation Benchmarking | Ziheng Duan |
Rafael Sinjunatha Wulangsih | Institut Teknologi Bandung (ITB) | Environment and VAA Preparation and/or Reproducibility for Dynamic Bandwidth Allocation (CONCIERGE) | Yuyang (Roy) Huang, Junchen Jiang |
Shaivi Malik | Guru Gobind Singh Indraprastha University | Data leakage in applied ML: reproducing examples from genomics, medicine and radiology | Fraida Fund, Mohamed Saeed |
Shuang Liang | Ohio State University | Understanding and Addressing Scalability Bugs in Large-Scale Distributed Systems | Yang Wang, Bogdan ‘Bo’ Stoica |
Syed Qasim | Boston University | ML-Powered Problem Detection in Chameleon | Ayse Coskun, Michael Sherman |
Triveni Gurram | Northern Illinois University | Reproducibility in Data Visualization | David Koop |
William Nixon | Bandung Institute of Technology | Developing A Comprehensive Pipeline to Benchmark Drift Management Approaches | Ray Andrew Sinurat, Sandeep Madireddy |
Xikang Song | University of Chicago | Fail-Slow Algorithm Analysis: Language Model Based Detection of the RAID Slowdown Conditions | Kexin Pei, Ruidan Li |
Zahra Nabila Maharani | Universitas Dian Nuswantoro | ScaleRep: Reproducing and benchmarking scalability bugs hiding in cloud systems | Bogdan “Bo” Stoica, Yang Wang |