Manuscript red-team reference · companion to Paper Review

cherry-picked seeds

Single-run reporting that hides variance.

What it is

Many ML results are reported as a single number for the proposed method versus a single number for each baseline. Run-to-run variance from random seeds, data shuffling, and parallel-training nondeterminism can be larger than the reported difference between methods. Single-seed reporting transforms within-method noise into between-method 'improvement'.

Why a reviewer cares

Reviewers ask: how many seeds? Were results averaged? Where are the error bars? A bolded winner in a table cell that beats a baseline by 0.3 percentage points is meaningless without a measure of seed-to-seed variance.

How to fix it

Run at least 3 seeds (ideally 5-10) and report mean ± standard deviation. Compute a statistical test on the difference. Where seed-runs are expensive (very large models), at least disclose that the result is single-run and discuss reproducibility risk.

This is one of ~15 canonical methodology explainers Paper Review's red-team report links to. To get a full review of your manuscript, start a Paper Review — $9.