Advancing Reproducibility in Computer Science: Insights from the Summer of Reproducibility Program

The Summer of Reproducibility (SoR) program has funded over 40 diverse student projects addressing key challenges in computational reproducibility over the past two years. These projects not only advance the state of reproducibility in computer science research but also provide excellent learning opportunities for students working alongside experienced mentors.

For those interested in proposing future projects or learning about successful initiatives, we’ve categorized recent SoR projects by their primary impact areas:

1. Artifact Evaluation Support

Projects in this category enhance methodologies for evaluating and verifying research artifacts, addressing the significant challenges in standardizing artifact review processes that take place at CS conferences like SC and FAST.¹.

Example: Kraßnitzer’s AutoAppendix project (2024) developed evidence-based guidelines for high-performance computing artifact reproducibility through systematic evaluation of SC24 conference submissions ². The project produced concrete deliverables including documentation templates and Chameleon Trovi artifacts that enable “one-click reproducibility.” These templates standardize critical reproducibility elements including environment configuration, execution validation, and expected outputs. The resulting framework has been published as a report and uploaded to Chameleon’s public Trovi artifact repository to improve artifact evaluation processes.

2. Reproducibility in Education

These projects integrate reproducibility principles into computer science curricula, responding to the recognized gap in reproducibility education identified by Fund ³.

Example: Saeed’s “Using Reproducibility in Machine Learning Education” (2023) ⁴ developed open educational resources that guide students through reproducing published machine learning research. The project created interactive notebooks focusing on significant papers like “On Warm Starting Neural Network Training” and “An Image is Worth 16×16 Words,” teaching essential reproducibility skills including claim verification, computational complexity assessment, and experimental replication. These materials were subsequently presented at the UC Open Source Symposium 2023 and are available for public use (i.e., integration into machine learning courses at other institutions).

3. Reproducibility Methodology

Projects focusing on methodology examine and formalize approaches to improve reproducibility, particularly addressing the challenge of experiment packaging identified in reproducibility studies ⁵.

Example: Ren’s research on “Measuring Open-source Database Systems under TPC-C Benchmark with Unreported Settings” (2023) ⁶ empirically demonstrated how omitted configuration parameters significantly impact reproducibility of database performance results. Through systematic variation of PostgreSQL settings including shared_buffer, effective_cache_size, and WAL parameters, the project quantified performance variations based solely on unreported settings. This work provides empirical evidence supporting more comprehensive reporting requirements for experimental configurations in systems research.

4. Reproducibility Tools

This category encompasses software tools that automate or simplify reproducibility tasks, addressing the need for specialized tooling identified by both ACM and the National Academies ⁷.

Example: Dabral’s “Automatic Reproducibility of COMPSs Experiments” (2024) ⁸ developed a service that programmatically analyzes RO-Crate artifact metadata to reconstruct experimental environments on the Chameleon testbed. The tool implements path mapping, runtime adaptation, and remote dataset handling, significantly reducing the technical barriers to reproducing computational experiments. This service has been integrated into the official COMPSs distribution (version 3.3.1) and is now available as a Chameleon appliance, enabling wider adoption of reproducibility practices in distributed computing research.

5. Artifact Packaging

Projects in this category advance techniques for comprehensive packaging of research artifacts, addressing challenges in environment contextualization and experimental setup ⁹.

Example: The FlashNet project (2023) ¹⁰ implemented reproducible machine learning pipelines for storage systems, with specific focus on I/O latency prediction models. The team developed both model training procedures and drift detection algorithms, demonstrating the integration of ML reproducibility practices in systems research. Their artifacts, packaged using Chameleon Trovi, enable deployment on clustered storage systems and include comprehensive documentation of hardware dependencies and software environments necessary for reproduction.

These categories and exemplars from the SoR program demonstrate how targeted student projects can systematically address key challenges in computational reproducibility. By producing concrete artifacts, methodologies, and educational resources, these initiatives contribute to the broader scientific goal of improving reproducibility in computer science research while training the next generation of researchers in reproducible practices.

We hope they provide some inspiration to the community seeking new projects to continue the tremendous progress towards practical reproducibility in computer science.