Why it matters
When Paper Review flags an issue, it tags the failure mode in canonical terms. The pages below explain each tag in 200-400 words — what it is, why a journal reviewer cares, and how to fix it. These aren't comprehensive treatments; they're the briefing you'd give a smart co-author who hadn't heard the term.
Statistics & design
- p-hacking — selectively reporting whichever analysis returned p < 0.05.
- multiple comparisons — when running 20 tests, 1 will be "significant" by chance.
- underpowered sample — too few subjects to detect the effect you claim.
- HARK-ing — Hypothesizing After the Results are Known.
- garden of forking paths — many analysis choices, only one reported.
Machine learning
- test-train contamination — when the test set leaked into training.
- hyperparameter asymmetry — tuning the proposed method but not baselines.
- cherry-picked seeds — single-run reporting that hides variance.
- missing ablations — claims without ablating each architectural choice.
- LLM self-judge bias — using the same model family to evaluate itself.
Sampling & generalisability
- WEIRD samples — generalising from Western, Educated, Industrialised, Rich, Democratic samples to all humans.
- blinding not evidenced — claiming blinding without methods support.
Citations & integrity
- hallucinated citation — references that look real but aren't.
- figure presentation — broken axes, missing error bars, color-only encoding.
- trial not registered — clinical claim without a registry entry.