Insensitivity to Sample Size

Category: Probability & Belief

You judge how likely a result is by how it looks, and ignore how many data points produced it. A 60% coin-flip streak feels equally surprising whether it came from 10 flips or 10,000, even though the second is basically a miracle and the first is a Tuesday.

How it works

Statistically, the variance of a sample mean shrinks as the sample gets bigger (the standard error scales with 1 over the square root of N). So small samples produce extreme-looking results constantly, and large samples cluster tightly around the truth. Your intuition does not carry this correction. Instead, per Kahneman and Tversky, you assess probability by representativeness: you ask whether a sample looks like the population, and a 70% result looks equally "off" regardless of whether it came from 5 trials or 5,000. Because size drops out of the calculation, you overreact to flukes in small data and underrate the reliability of big data, and you expect small samples to mirror the population far more faithfully than chance allows.

Where you'll see it

  • The hospital problem (Tversky and Kahneman, 1974): the small hospital delivering 15 babies a day records far more days with over 60% boys than the large one delivering 45, because small samples swing harder. Only 22% of subjects picked the small hospital; the majority said the two were equally likely.
  • A/B tests called early: your new checkout button shows a 40% conversion lift after 30 visitors, so you ship it. With 30 people the confidence interval is enormous and that 'lift' is mostly noise. Do the same test with 30,000 and the effect quietly collapses to 1%.
  • The 'best small schools' trap (Wainer and Zwerling, popularized by Kahneman): studies found top-performing schools were disproportionately small, and funders poured money into shrinking schools. But the *worst*-performing schools are also disproportionately small, for the same reason. Small samples produce extreme outcomes on both ends, not superior teaching.
  • Mutual fund and doctor rankings: a fund that beats the market three years running or a surgeon with a perfect 12-for-12 record looks elite, but 3 years and 12 cases are tiny samples where luck dominates. The 'star' regularly reverts to average once the sample grows.

Where it comes from

Amos Tversky and Daniel Kahneman introduced the effect in "Belief in the Law of Small Numbers" (Psychological Bulletin, 1971), showing even professional psychologists wrongly expected tiny samples to mirror the population, then formalized it in "Subjective Probability: A Judgment of Representativeness" (Cognitive Psychology, 1972), where subjects gave nearly identical probability estimates for samples of N=10, 100, and 1,000. They packaged it for a wide audience as one of the representativeness biases in the landmark "Judgment under Uncertainty: Heuristics and Biases" (Science, 1974), home of the hospital problem. A preregistered replication by Mayiwar, Wan, Løhre, and Feldman (first published online 2024) reproduced eight of the nine original 1972 problems, so the effect is not a 1970s artifact.

How to counter it

Always ask "N = what?" before you react. Attach the sample size to every rate, average, or streak before you feel anything about it. "Converts at 60%" is meaningless until you know it is 6 of 10 versus 6,000 of 10,000, and only one of those deserves a decision.

Pre-commit to a sample size, then wait for it. Decide how many observations you need before you peek, and refuse to act until you hit it. This kills the reflex to stop an A/B test the moment it looks good, which is exactly when small-sample noise is loudest.

Expect extremes from small groups on both ends. When you see a "best" performer (school, surgeon, fund, region), check whether the "worst" performers are also small. If both tails are dominated by small samples, you are looking at variance, not talent, and the ranking will revert.

Compute a rough standard error instead of trusting the number. For a proportion, the wobble is about 0.5 divided by the square root of N: that is plus or minus 16 points at N=10 but plus or minus 1.6 points at N=1,000. Doing this once per claim retrains your gut faster than any warning label.

The tell

You get excited (or alarmed) by a percentage, a win streak, or a "clear trend" and you have not once asked how many observations produced it. If your reaction to "62% preferred it" would be identical whether the base was 8 people or 80,000, the bias is driving.

Related biases

References

  1. Amos Tversky, Daniel Kahneman (1971). Belief in the law of small numbers. Psychological Bulletin, 76(2), 105-110
  2. Daniel Kahneman, Amos Tversky (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology, 3(3), 430-454
  3. Amos Tversky, Daniel Kahneman (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131
  4. Lewend Mayiwar, Kai Hin Wan, Erik Løhre, Gilad Feldman (2025). Revisiting representativeness heuristic classic paradigms: Replication and extensions of nine experiments in Kahneman and Tversky (1972). Quarterly Journal of Experimental Psychology, 78(4), 707-730
  5. Howard Wainer, Harris L. Zwerling (2006). Evidence that smaller schools do not improve student achievement. Phi Delta Kappan, 88(4), 300-303