Missing Data: One-Page Overview
1. What Is Missing Data and Why It Matters?
Missing data occurs when certain observations or features in a dataset are absent or
unrecorded. It is a common challenge in statistical modeling and data analysis, and
inappropriate handling can bias estimates, reduce statistical power, distort relationships,
or invalidate modeling assumptions. Understanding the mechanism behind missingness is
essential for selecting appropriate treatment methods.
2. Types of Missing Data (Mechanisms)
1. Missing Completely at Random (MCAR)
Probability of missingness is unrelated to any observed or unobserved data.
Example: a sensor randomly fails for a few readings.
Least problematic; complete-case analysis remains unbiased.
2. Missing at Random (MAR)
Missingness depends on observed data but not the missing value itself.
Example: income missing more frequently for younger respondents (age observed).
Most realistic assumption for many imputation methods (e.g., MICE).
3. Missing Not at Random (MNAR)
Missingness depends on unobserved or missing values.
Example: individuals with higher income less likely to report it.
Most challenging; often requires special modeling (e.g., selection models).
3. Diagnosing Missingness
Missingness patterns: monotone, intermittent, blockwise.
Missingness maps: visualize missing entries (heatmaps, patterns).
Statistical tests: Little’s MCAR test to assess MCAR assumption.
https://www.stuvia.com/user/nursecare