In 1886, Francis Galton measured the heights of parents and their adult children. He noticed something puzzling: unusually tall parents tended to have children shorter than themselves, and unusually short parents tended to have children taller than themselves. Galton called this 'regression to mediocrity'—now known as regression to the mean. This statistical phenomenon is responsible for countless false conclusions about interventions, treatments, and causes.

Why It Happens

Any measured outcome combines true underlying ability or value with random variation. An exceptional score—unusually high or low—reflects both genuine extremeness and favorable or unfavorable random variation. When measured again, the random component is new and independent, while the true component remains roughly the same. The result: the second measurement's extreme component is likely to be less extreme than the first. The person who scored 95 out of 100 likely has high ability (say 80) plus lucky random factors (+15). Next measurement, ability is still 80 but luck averages to 0—score returns toward 80.

The Galton Board

Galton illustrated regression to the mean with his famous quincunx (Galton board)—a pegboard where balls fall through rows of pins, spreading into a normal distribution at the bottom. Balls at the extreme right aren't pushed back to center; they simply bounce randomly among the pegs, and subsequent bounces are no more likely to go right than left. Extreme positions result from sequences of all-rightward bounces, which are rarer than mixed sequences. The next step is most likely toward center—not because of any corrective force, but because that's where most paths lead by pure probability.

The Danger of False Causation

Regression to the mean produces apparent treatment effects where none exist. Students performing poorly on a test receive tutoring and improve on the next test. Did tutoring help? Maybe—but some improvement would happen anyway through regression to the mean. The worst performers are worst partly due to bad luck; next time, luck is more average. Without a control group that didn't receive tutoring, you cannot separate regression effects from genuine improvement. This confound affects medical intervention studies, business coaching programs, and educational research constantly.

Sports and the Sophomore Slump

Sports provide vivid examples. A rookie baseball player with an extraordinary first season often has a worse second season—the 'sophomore slump.' Exceptional first seasons reflect both genuine talent and statistical good luck. The player's true ability didn't change; the luck component reverted toward normal. Similarly, the Sports Illustrated cover jinx—athletes featured on the cover often perform worse afterward—is substantially regression to the mean. They were featured precisely because of recent exceptional performance that included lucky variation which won't persist.

Correct Interpretation

Recognizing regression to the mean requires always asking: was this group or individual selected based on extreme performance? If yes, expect regression regardless of any intervention. The cure for misinterpretation is randomized controlled trials: randomly assign people to treatment and control groups before measuring outcomes. Both groups regress equally, so any remaining difference after treatment represents the genuine treatment effect. This is why randomization is the gold standard in medical research—it controls for regression to the mean along with all other confounders simultaneously.

Conclusion

Regression to the mean is not a force pushing outcomes toward average—it's a mathematical consequence of measurement variation combined with selection of extremes. Galton's observation about heights launched statistics as a scientific discipline, giving us correlation, regression analysis, and much of inferential statistics. Understanding this phenomenon prevents false conclusions about interventions, explains why exceptional outcomes tend not to last, and illustrates why the randomized controlled trial is the only reliable way to distinguish genuine causal effects from statistical artifacts.