ACM RecSys ’25, Prague, Czech Republic

Issues with reproducibility have been identified as a major factor hampering progress in recommender systems research. In response, researchers increasingly share the code of their models. However, the provision of only the code of the proposed model is usually not sufficient to ensure reproducibility. In many works, the central claim is that a new model is advancing the state-of-the-art. Thus, it is crucial that the entire experiment is reproducible, including the configuration and the results of the considered baselines. With this work, our goal is to gauge the level of reproducibility in algorithms research in recommender systems. We systematically analyzed the reproducibility level of 65 papers published at a top-ranked conference during the last three years. Our results are sobering. While the model code is shared in about two thirds of the papers, the code of the baselines is provided only in eight cases. The hyperparameters of the baselines are reported even less frequently, and how these were exactly determined is not explained in any paper. As a result, it is commonly not only impossible to reproduce the full result tables reported in the papers, it is also unclear if the claimed improvements over the state-of-the-art were actually achieved. Overall, we conclude that the research community has not reached the required level of reproducibility yet. We therefore call for more rigorous reproducibility standards to ensure progress in this field.

Note: To avoid pointing out individual authors, we refrain from sharing the detailed analysis of the considered papers on the GitHub. However, interested readers may request access to the full analysis via the Zenodo platform.

GitHub: Python script to reproduce given figures in the paper

ACM RecSys ’25, Prague, Czech Republic

"We share our code online": Why this is not enough to ensure reproducibility and progress in recommender systems research

Additional material: Source code and Datasets.