Confidence Intervals: What Polls Really Tell Us

A poll says 52% of voters prefer Candidate A, with a margin of error of ±3%. What does that margin of error actually mean? Most people interpret it as 'the true value is definitely between 49% and 55%'—but that's not quite right. A confidence interval is a statement about the procedure used to create it, not a probability statement about any fixed unknown value. Understanding this distinction separates careful statistical thinking from the oversimplified versions reported in headlines.

Why We Need Intervals

Point estimates—single numbers like '52% support'—ignore sampling uncertainty. Every sample is different; a different random sample of the same population would yield a different estimate. Confidence intervals quantify this variability, communicating not just what we estimate but how precise that estimate is. A 95% confidence interval means: if we repeated this sampling procedure many times, 95% of the constructed intervals would contain the true population value. The interval reflects the reliability of our method, not the location of the unknown truth.

Constructing a Confidence Interval

For a sample proportion p̂ based on n observations, the standard error is √(p̂(1−p̂)/n). A 95% confidence interval uses the fact that sample proportions are approximately normally distributed (by the Central Limit Theorem): the interval extends 1.96 standard errors on either side of the estimate. For p̂ = 0.52 and n = 1000: SE = √(0.52 × 0.48 / 1000) ≈ 0.016, giving interval (0.52 − 1.96 × 0.016, 0.52 + 1.96 × 0.016) ≈ (0.489, 0.551). The '3% margin of error' in headlines is typically this ±1.96 × SE.

95% CI: p̂ \pm 1.96 \times \sqrt(p̂(1-p̂)/n) Margin of error = 1.96 \times SE

The Correct Interpretation

The 95% confidence level refers to the long-run performance of the procedure: 95% of intervals constructed this way will contain the true value. For any specific interval, the true value either is or isn't inside it—there's no probability involved for a fixed but unknown constant. This subtlety trips up even experienced researchers. A Bayesian credible interval, by contrast, directly states the probability that the parameter lies within a range, by treating the unknown parameter as a random variable with a prior distribution.

Factors Affecting Width

Confidence interval width is determined by three factors. Sample size n: larger samples give narrower intervals (width ∝ 1/√n). Confidence level: a 99% interval is wider than a 95% interval—more certainty requires a wider net. Variability: more variable populations require wider intervals to capture the true value reliably. Doubling the sample size doesn't halve the margin of error—it reduces it by a factor of √2 ≈ 1.41. To halve the margin of error, you must quadruple the sample size, which explains why polling precision is expensive to improve.

Width \propto z* / \sqrtn To halve width: multiply n by 4

Beyond Proportions

The same framework applies to means, differences, regression coefficients, and virtually any estimated quantity. The t-distribution replaces the normal distribution when the population standard deviation is unknown and estimated from the sample—this is the standard case in practice. For small samples, t-intervals are wider than normal-based intervals, reflecting additional uncertainty about the population spread. Modern statistical software computes confidence intervals automatically for most procedures, but understanding the underlying logic is essential for correct interpretation.

Conclusion

Confidence intervals communicate statistical uncertainty in a single, interpretable range. They are fundamental to evidence-based practice across medicine, social science, engineering, and business—any field where decisions rest on data from samples. The critical habit is to remember that the confidence level describes the procedure's reliability over many repetitions, not the probability that any particular interval contains the truth. With that understanding, 'margin of error ±3%' transforms from a vague disclaimer into a precise statement about sampling precision.