WITH COMPLETE SOLUTIONS VERIFIED
Statistics
the body of methods for obtaining and analyzing data; a summary of the sample data.
They provide methods for:
• Design: Planning how to gather data for a research study to investigate questions of
interest to us.
• Description: summarizing data obtained in the study
• Inference: making predictions based on data, to help us deal with uncertainty in an
objective manner
Observational Studies
captures a characteristic of interest;
Collect data by merely observing outcomes [though be wary of OVB - omitted variable
bias, caused by selection bias]
Data
the collection of observations that interest us
Population
total set of subjects of interest in a study
Parameter
A numerical summary of the population
Sample
, the subset of the population on which the study collects data.
Descriptive statistics
summarize the information in a collection of data;
Two main characteristics for numeric data are:
•Central tendency (describing typical observations)
•Dispersion (describing variation across observations)
Inferential statistics
Provide predictions about a population, based on data from a sample of that population
Variable
A characteristic that can vary in value among subjects in a sample or population. The
values the variable can take form the measurement scale.
Interval variable
has meaningful numeric distance between levels.
• A discrete variable: its possible values form a set of separate numbers [number of
siblings = 0, 1, 2, 3...]
• A continuous variable: can take an infinite continuum of possible real number values
[height in cm = 183.2192.... ]
Categorical variable
does not have a natural or meaningful numeric distance between levels.
• An ordinal variable has a natural order [how do you feel about this class = excited,
neutral, terrified]
• A nominal variable has no clear high or low [type of transportation = bus, car, bike,
walk]