Research methodology and descriptive statistics
Summary test 2
Contents
Unit 13 - Visualizing and analyzing bivariate relationships in R ........................................................ 2
Unit 24: Describing the association between two variables ............................................................. 4
Unit 12: causality and bivariate causal hypotheses ........................................................................ 5
Unit 15: research designs for testing causal hypotheses ................................................................. 7
Unit 14: Causality and the eEect of third variable ........................................................................... 9
Unit 16, 17, 18: Analyzing multivariate relationships ....................................................................... 9
Unit 19: Sampling ........................................................................................................................ 11
Unit 23 – Normal distribution ........................................................................................................ 12
Unit 20 – First steps towards inference: certainty about means ...................................................... 12
Unit 22: Research ethics .............................................................................................................. 15
,Unit 13 - Visualizing and analyzing bivariate relationships in R
Bivariate analysis: statistical method examining how two di2erent things are related
Contingency table: display relationship between 2 ordinal or nominal variables
Similar to frequency table à always concerns 1 variable
Columns: independent variable
Rows: dependent variable
Cells: column percentages
Column percentages = cell / total (column) * 100
Scatterplots: quantitative variables, more precise information
x-as: independent variable
y-as: dependent variable
Regression line: the line
that completely fits the data,
such that the overall
distance from the line to the
pot outlined on a graph is the
smallest
Strength (of a bivariate relationship): indicates how closely two variables are associated
with each other. The stronger the relationship, the better one variable can predict or
explain changes in the other
Direction (of a bivariate relationship): whether the relationship between two variables is
positive or negative. Helps to describe the nature of the relationship between 2 variables
2
, - Positive direction: as one variable increases, the other also increases
- Negative direction: as one variable increases, the other decreases
Linear relationship: describes a straight-line relationship between 2 variables.
- If x-value is changed à y-value must also change in the same proportion
Plots in R-studio
#load ggplot2
Library(ggplots2)
Library(dplyr)
Library(ggthemes)
#import dataset
Avocado_data = read.csv(“avocado_data.csv”)
#scatterplot data, average price and region à geom_type of plot you want
Avocado_data %>%
ggplot(aes(x = Date, y = AveragePrice, color region)) +
geom_point()
#add title and give axis name
Avocado_data %>%
ggplot(aes(x = Date, y = AveragePrice, color region)) +
geom_point()
labs(title = “average avcado prices in the US over time”,
x = “date”
y = “average price”,
color = “region”)
3
Summary test 2
Contents
Unit 13 - Visualizing and analyzing bivariate relationships in R ........................................................ 2
Unit 24: Describing the association between two variables ............................................................. 4
Unit 12: causality and bivariate causal hypotheses ........................................................................ 5
Unit 15: research designs for testing causal hypotheses ................................................................. 7
Unit 14: Causality and the eEect of third variable ........................................................................... 9
Unit 16, 17, 18: Analyzing multivariate relationships ....................................................................... 9
Unit 19: Sampling ........................................................................................................................ 11
Unit 23 – Normal distribution ........................................................................................................ 12
Unit 20 – First steps towards inference: certainty about means ...................................................... 12
Unit 22: Research ethics .............................................................................................................. 15
,Unit 13 - Visualizing and analyzing bivariate relationships in R
Bivariate analysis: statistical method examining how two di2erent things are related
Contingency table: display relationship between 2 ordinal or nominal variables
Similar to frequency table à always concerns 1 variable
Columns: independent variable
Rows: dependent variable
Cells: column percentages
Column percentages = cell / total (column) * 100
Scatterplots: quantitative variables, more precise information
x-as: independent variable
y-as: dependent variable
Regression line: the line
that completely fits the data,
such that the overall
distance from the line to the
pot outlined on a graph is the
smallest
Strength (of a bivariate relationship): indicates how closely two variables are associated
with each other. The stronger the relationship, the better one variable can predict or
explain changes in the other
Direction (of a bivariate relationship): whether the relationship between two variables is
positive or negative. Helps to describe the nature of the relationship between 2 variables
2
, - Positive direction: as one variable increases, the other also increases
- Negative direction: as one variable increases, the other decreases
Linear relationship: describes a straight-line relationship between 2 variables.
- If x-value is changed à y-value must also change in the same proportion
Plots in R-studio
#load ggplot2
Library(ggplots2)
Library(dplyr)
Library(ggthemes)
#import dataset
Avocado_data = read.csv(“avocado_data.csv”)
#scatterplot data, average price and region à geom_type of plot you want
Avocado_data %>%
ggplot(aes(x = Date, y = AveragePrice, color region)) +
geom_point()
#add title and give axis name
Avocado_data %>%
ggplot(aes(x = Date, y = AveragePrice, color region)) +
geom_point()
labs(title = “average avcado prices in the US over time”,
x = “date”
y = “average price”,
color = “region”)
3