1/24/2024
Statistics: science of collecting studies and study of variation
- Collect
- Organize
- Summarize
- Analyze
- Draw conclusions from data
Subject: Single entity in which we take measurements on (person, animal, thing, etc.)
Variable: Characteristic of the subjects in a study. Anything that can change from observation to
observation.
Categorical/Qualitative Variable: Data consists of names or labels (if numbers are used, just
labels that do not represent counts or measurements).
Ex. country of residence, occupation, ranking in a competition, grade of student in a class
Quantitative Variable: Data consists of numbers representing counts or measurements.
Quantitative Discrete Variable: The data results when the number of values is finite or
“countable”
- Ex. number of children in a family, number of cars
Quantitative Continuous Variable: Data results from infinitely many possible quantitative values,
where the collection of values is “not countable”
- Ex. weight (pounds), time (seconds), elevation over Earth’s surface (miles)
Descriptive Statistics: Observe data with graphical and numerical summaries
- Seeks to take data and reduce them to a few key things that encapsulate the important
aspects of the data
Inferential Statistics: Involves using a sample to say something about a population.
,Quantitative Small (under 30): stem and leaf plot or dot plot
Quantitative Large (over 30): histogram or boxplot
Categorical: Pie chart or bar chart
1/29/2024
Controlled Experiments
- Consider the case of polio vaccine. Researchers want to know if it is effective in
preventing polio.
- Question: is a medicine effective?
- Basic idea: comparison
- Give some patients the vaccine (treatment), do not give to others (control).
- Treatment and control group should be as similar as possible except for the
administration of treatment
- Randomized controlled experiment: subjects assigned randomly to treatment and control
group.
- Placebo: patient believes they are being treated but actually not so does nothing.
- Double-Blind Study: Neither patient nor researcher know who receives the placebo.
- Need all of these: randomized, double-blind, controlled trial to for certain establish
causality
Observational Studies
- A study when researchers don’t choose who is in the treatment and the control
- Method of comparison still used, but subjects assign themselves to treatment and control
(ex. do not force people to smoke)
- Isolate treatment effects by trying to control confounding variables
- Cannot establish causality for certain
Histogram
- Graph created using set of touching blocks
- Summarize individual values into blocks containing similar values
- Describe based on tendency, spread, and shape
Types:
- Frequency
- Relative frequency
, - Density (what we will focus on)
- X axis has units of data
- Y axis has Density units (%/x unit)
- Bases of the blocks are made of class intervals dictated by the data
- Area of a block represents the percentage of cases in that interval
- Height of each block represents the percentage of cases in that interval
- Total area of the histogram will always add up to 100%
- Appearance highly affected by choice of bins
1/31/2024
Reading Shape from Quantitative Graphs
- Data is skewed in a certain direction if it has more extreme values in that direction.
- Data without a skew is called symmetric.
Bar Chart
● A bar graph displays a vertical bar (or horizontal) for each category.
● Bars for each category are apart, not side by side.
● Height of the bar is the frequency of observations in the category.
● If we use percentage, it shows the relative distribution of categorical data so that it is
easier to compare the different categories
● Not great visual data representation
To draw:
- Create horizontal axis with equally spaced categories
- Plot bars using percentages
- Add a title to top or bottom of the graph
- Categories don’t have to be in any particular order
Pie Chart
● Not great visual data representation
● Can be nearly impossible to tell which category is larger than the other
Mean: Balancing point of the data set
- Histogram balances when supported at the average
- Small area far away from the average can balance a large area close to the average,
because areas are weighted by their distance from the balance point
Statistics: science of collecting studies and study of variation
- Collect
- Organize
- Summarize
- Analyze
- Draw conclusions from data
Subject: Single entity in which we take measurements on (person, animal, thing, etc.)
Variable: Characteristic of the subjects in a study. Anything that can change from observation to
observation.
Categorical/Qualitative Variable: Data consists of names or labels (if numbers are used, just
labels that do not represent counts or measurements).
Ex. country of residence, occupation, ranking in a competition, grade of student in a class
Quantitative Variable: Data consists of numbers representing counts or measurements.
Quantitative Discrete Variable: The data results when the number of values is finite or
“countable”
- Ex. number of children in a family, number of cars
Quantitative Continuous Variable: Data results from infinitely many possible quantitative values,
where the collection of values is “not countable”
- Ex. weight (pounds), time (seconds), elevation over Earth’s surface (miles)
Descriptive Statistics: Observe data with graphical and numerical summaries
- Seeks to take data and reduce them to a few key things that encapsulate the important
aspects of the data
Inferential Statistics: Involves using a sample to say something about a population.
,Quantitative Small (under 30): stem and leaf plot or dot plot
Quantitative Large (over 30): histogram or boxplot
Categorical: Pie chart or bar chart
1/29/2024
Controlled Experiments
- Consider the case of polio vaccine. Researchers want to know if it is effective in
preventing polio.
- Question: is a medicine effective?
- Basic idea: comparison
- Give some patients the vaccine (treatment), do not give to others (control).
- Treatment and control group should be as similar as possible except for the
administration of treatment
- Randomized controlled experiment: subjects assigned randomly to treatment and control
group.
- Placebo: patient believes they are being treated but actually not so does nothing.
- Double-Blind Study: Neither patient nor researcher know who receives the placebo.
- Need all of these: randomized, double-blind, controlled trial to for certain establish
causality
Observational Studies
- A study when researchers don’t choose who is in the treatment and the control
- Method of comparison still used, but subjects assign themselves to treatment and control
(ex. do not force people to smoke)
- Isolate treatment effects by trying to control confounding variables
- Cannot establish causality for certain
Histogram
- Graph created using set of touching blocks
- Summarize individual values into blocks containing similar values
- Describe based on tendency, spread, and shape
Types:
- Frequency
- Relative frequency
, - Density (what we will focus on)
- X axis has units of data
- Y axis has Density units (%/x unit)
- Bases of the blocks are made of class intervals dictated by the data
- Area of a block represents the percentage of cases in that interval
- Height of each block represents the percentage of cases in that interval
- Total area of the histogram will always add up to 100%
- Appearance highly affected by choice of bins
1/31/2024
Reading Shape from Quantitative Graphs
- Data is skewed in a certain direction if it has more extreme values in that direction.
- Data without a skew is called symmetric.
Bar Chart
● A bar graph displays a vertical bar (or horizontal) for each category.
● Bars for each category are apart, not side by side.
● Height of the bar is the frequency of observations in the category.
● If we use percentage, it shows the relative distribution of categorical data so that it is
easier to compare the different categories
● Not great visual data representation
To draw:
- Create horizontal axis with equally spaced categories
- Plot bars using percentages
- Add a title to top or bottom of the graph
- Categories don’t have to be in any particular order
Pie Chart
● Not great visual data representation
● Can be nearly impossible to tell which category is larger than the other
Mean: Balancing point of the data set
- Histogram balances when supported at the average
- Small area far away from the average can balance a large area close to the average,
because areas are weighted by their distance from the balance point