This tutorial provides a foundational overview of statistics and data analysis, covering descriptive and
inferential methods, hypothesis testing, correlation, regression, and cluster analysis. Its designed for
beginners and builds from basic concepts to more complex techniques.
Part 1: What is Statistics
Statistics deals with the collection, analysis, and presentation of data. Its divided into two main
branches:
Descriptive Statistics: Summarizes the characteristics of a sample think mean, median, mode,
standard deviation, and frequency tables. This doesnt generalize beyond the data collected. For
example, surveying employees about their commute and simply describing the results 14 drive, 6 bike
Inferential Statistics: Uses sample data to make inferences about a population. This is about drawing
conclusions and predictions. The core question: can we generalize what we see in our sample to the
larger group
Key Descriptive Statistics:
Measures of Central Tendency:
Mean: Average sum of values / number of values. Sensitive to outliers.
Median: Middle value when data is ordered. Robust to outliers a tall person wont drastically change it.
Mode: Most frequent value.
Measures of Dispersion:
Standard Deviation: Measures the average distance of data points from the mean. A larger standard
deviation means more spread.
Variance: The square of the standard deviation.
Range: Maximum value minus minimum value.
Interquartile Range: Middle 50 of the data.
Tables:
Frequency Tables: Summarize how often each value appears in a dataset.
Contingency Tables Cross-Tabs: Show the relationship between two categorical variables e.g.,
gender and preferred newspaper