Manuscript red-team reference · companion to Paper Review

multiple comparisons

When you run 20 tests, 1 will be 'significant' by chance.

What it is

With α = 0.05, every independent statistical test has a 5% chance of returning a false-positive result. Run 20 tests and the expected false-positive count is 1. Run 100 tests and you'll find roughly 5 'significant' results from random noise alone.

Why a reviewer cares

Common failure modes: testing every cell in a correlation matrix; checking every demographic subgroup for an effect; comparing every pair of conditions in a factorial design without correction; running a separate test per outcome in a battery and headlining whichever popped.

How to fix it

Apply a correction. Bonferroni (divide α by the number of tests) is conservative and rarely controversial. Benjamini-Hochberg (FDR) is appropriate when many true positives are expected. Either way, name the correction in the methods and apply it consistently. Where possible, pre-specify which contrasts are primary so you only need to correct over those.

This is one of ~15 canonical methodology explainers Paper Review's red-team report links to. To get a full review of your manuscript, start a Paper Review — $9.