Simpson’s Paradox is a fascinating yet weird concept in statistics where a
trend appears in several groups of data but reverses or disappears when
those groups are combined. Basically, when you look at data in separate
groups, you might see one thing, but when you combine those groups, a
completely different story emerges. It’s like optical illusions. This happens
because the overall data can distort the relations within individual groups
due to how the data is distributed and many other reasons.Let’s break it
down using an example. Let's take two hospitals, “Hospital A” and “Hospital
B” - treating patients with minor and serious illnesses. Here’s the data:
Hospital A:
- Minor illnesses: 90% recovery rate
- Serious illnesses: 30% recovery rate
Hospital B:
- Minor illnesses: 85% recovery rate
- Serious illnesses: 40% recovery rate
If we look at the recovery rates separately, Hospital A seems better at
treating minor illnesses, while Hospital B seems to do better with serious
ones. But here’s where Simpson’s Paradox comes in - when you combine
the data, you might find that Hospital B has a better overall recovery
rate than Hospital A.This happens because more patients with serious
conditions went to Hospital A, which lowers its overall recovery rate.
Hospital B treated more patients with minor conditions, which boosted its
overall statistics. So, the combined data hides what’s actually
happening in each category.
Another well-known example is from the University of California, Berkeley
in 1973. People thought the university was being unfair to women because
men had a higher “overall acceptance rate”. The data showed that 44% of
men were accepted, compared to 35% of women. However, when the data
was looked categorizing different departments, it turned out that most
departments were actually admitting a higher percentage of women than
men. The confusion happened because more women applied to