multiple comparisons
When you run 20 tests, 1 will be 'significant' by chance.
What it is
With α = 0.05, every independent statistical test has a 5% chance of returning a false-positive result. Run 20 tests and the expected false-positive count is 1. Run 100 tests and you'll find roughly 5 'significant' results from random noise alone.
Why a reviewer cares
Common failure modes: testing every cell in a correlation matrix; checking every demographic subgroup for an effect; comparing every pair of conditions in a factorial design without correction; running a separate test per outcome in a battery and headlining whichever popped.
How to fix it
Apply a correction. Bonferroni (divide α by the number of tests) is conservative and rarely controversial. Benjamini-Hochberg (FDR) is appropriate when many true positives are expected. Either way, name the correction in the methods and apply it consistently. Where possible, pre-specify which contrasts are primary so you only need to correct over those.
This is one of ~15 canonical methodology explainers Paper Review's red-team report links to. To get a full review of your manuscript, start a Paper Review — $5.