WEEK 1
LECTURE 1: INTRODUCTION, DATA EXPLORATION & VISUALIZATION
Sample more like
population by using post-stratification weights.
Non-Metric Metric (continuous)
- Nominal (categorical) - Interval
- Ordinal - Ratio
These outcomes can be categorical In contrast, when scales are continuous
(labels) or directional – can measure only they not only measure direction or
the direction of the response (e.g. yes/no) classification, but intensity as well (e.g.
strongly agree or somewhat agree).
Nominal: number serves as label or tag for identifying/classifying objects in mutually
exclusive and collectively exhaustive categories.
Ordinal: numbers are assigned to objects to indicate the relative positions of some
characteristics of objects, but NOT the magnitude of difference between them.
Interval: numbers are assigned to objects to indicate the relative positions of some
characteristic of objects with differences between objects being comparable; zero point is
arbitrary.
Ratio: the most precise scale; absolute zero point. Has all the advantages of other scales.
Calculating the mean of nominal/ordinal doesn’t make sense due to lack of numerical
value or consistent intervals.
Mean of nominal/ordinal doesn’t make sense due to lack of numerical value or consistent
intervals.
More questions -> reduce questions.
Summated scales:
Ex. Completely (dis)agree.
,Observed = nature exposed to the method of questioning, not nature itself.
Validity: measure what’s supposed? – plausible results.
Reliability: stable? – control variables, outliers, model-new dataset.
Hypothesis testing:
- Fail to reject the null
- Reject the null
Types of errors
- Type I error: false positive (pregnant man)
- Type II error: false negative (not-pregnant lady)
p-value = probability of observed data or statistic (or more extreme) given that the null
hypothesis is true.
Typically threshold (a) at 0.05 (reject the null if p-value is <a)
CHAPTER 1: OVERVIEW OF MULTIVARIATE METHODS
Multivariate analysis -> knowledge & improves decision-making = all statistical
techniques that simultaneously analyze multiple objects -> examine their relationships.
Dependent variable (DV) = …independent variable(s)….
Non-metric = qualitative.
Metric = quantitative.
Measurement error: degree to which the observed values are not representative of the
“true” values (all variables have some).
Selection of technique depends on:
- Can the variables be divided into (in)dependent classifications based on some
theory?
- If yes, how many variables are treated as dependent on a single analysis?
- How are the variables measured?
Philosophy of multivariate analysis:
Establish practical and statistical significance.
- Know your data.
- Strive for model parsimony.
- Look at errors.
- Validate results.
Step-by-step multivariate model building:
1. Define research problem, objectives and multivariate technique to be used.
2. Develop analysis plan – designing.
3. Evaluate assumptions.
4. Estimate multivariate model and evaluate fit.
5. Interpret variates.
6. Validate multivariate model.
CHAPTER 2: EXAMINING YOUR DATA
Graphics -> character of the data
Histogram: shape of variable’s distribution.
Scatterplot: bivariate.
Fourier transformation -> Chernoff face.
Identify missing data and remedy:
, - Determine type and whether to be ignored.
- Determine extent and whether to be deleted.
- Diagnose randomness.
- Select imputation method (valid/calculated).
MCAR -> missing data are randomly distributed and can be remedied without incurring
bias.
MAR -> underlying process results in a bias and remedy must ensure to not incur bias in
the process.
Outliers = observations with a unique combination of characteristics indicating they are
distinctly different from other observations.
Assumptions: normality (bell), homoscedasticity, linearity and absence of correlated
errors must be met.
Data transformations:
- Correct variations
- Improve relationship
Dummy variable = dichotomous variable that has been converted to a metric distribution
and represents one category of non-metric independent variables.
WEEK 2
LECTURE 2: ANOVA
Step-by-step:
1. Differences in mean of metric dependent variable across levels of 1(+) non-metric
independent variable(s) (“factors”).
2. Interactions -> the effect of 1 variable on DV is dependent on another variable, a
2-way interaction.
Control variables/”covariates” affect DV separately form the treatment variables.
Effective covariates will improve the statistical power of the tests and reduce
within-group variance. If unaccounted for, may bias estimates of treatment effects
-> ANCOVA.
3. Independence affects estimates and standard errors, underestimation (or
overestimation) of the variability within groups -> “row”.
“Between-subjects”: each unit of analysis ‘sees’ only one combination of IV’s.
“Within-subjects”: each unit of analysis sees ALL possible treatments. Each
compared with itself – control for “luck of draw”. Higher “power”.
Equality of variance (homoscedasticity) affects standard errors.
Levene’s test: H0: variances are equal (homoscedasticity)
H1: at least one group has a different variance.
Follows F-distribution, if (Pr(>F)) >= a, fail to reject the null hypothesis.
If homoscedasticity is violated:
Sample size similar across treatment groups -> robust Robust = reliable.
transform dependent variable (natural logarithm) – small samples.
Add covariate (ANCOVA)
Normality (distribution of residuals) affects your standard errors only is sample is
small.
Residuals need to be normally distributed !
Kolmogorov-Smirnov test
Shapiro-Wilk test
H0 = normal distribution. – p<a : normal distribution.
4. 1-way ANOVA
LECTURE 1: INTRODUCTION, DATA EXPLORATION & VISUALIZATION
Sample more like
population by using post-stratification weights.
Non-Metric Metric (continuous)
- Nominal (categorical) - Interval
- Ordinal - Ratio
These outcomes can be categorical In contrast, when scales are continuous
(labels) or directional – can measure only they not only measure direction or
the direction of the response (e.g. yes/no) classification, but intensity as well (e.g.
strongly agree or somewhat agree).
Nominal: number serves as label or tag for identifying/classifying objects in mutually
exclusive and collectively exhaustive categories.
Ordinal: numbers are assigned to objects to indicate the relative positions of some
characteristics of objects, but NOT the magnitude of difference between them.
Interval: numbers are assigned to objects to indicate the relative positions of some
characteristic of objects with differences between objects being comparable; zero point is
arbitrary.
Ratio: the most precise scale; absolute zero point. Has all the advantages of other scales.
Calculating the mean of nominal/ordinal doesn’t make sense due to lack of numerical
value or consistent intervals.
Mean of nominal/ordinal doesn’t make sense due to lack of numerical value or consistent
intervals.
More questions -> reduce questions.
Summated scales:
Ex. Completely (dis)agree.
,Observed = nature exposed to the method of questioning, not nature itself.
Validity: measure what’s supposed? – plausible results.
Reliability: stable? – control variables, outliers, model-new dataset.
Hypothesis testing:
- Fail to reject the null
- Reject the null
Types of errors
- Type I error: false positive (pregnant man)
- Type II error: false negative (not-pregnant lady)
p-value = probability of observed data or statistic (or more extreme) given that the null
hypothesis is true.
Typically threshold (a) at 0.05 (reject the null if p-value is <a)
CHAPTER 1: OVERVIEW OF MULTIVARIATE METHODS
Multivariate analysis -> knowledge & improves decision-making = all statistical
techniques that simultaneously analyze multiple objects -> examine their relationships.
Dependent variable (DV) = …independent variable(s)….
Non-metric = qualitative.
Metric = quantitative.
Measurement error: degree to which the observed values are not representative of the
“true” values (all variables have some).
Selection of technique depends on:
- Can the variables be divided into (in)dependent classifications based on some
theory?
- If yes, how many variables are treated as dependent on a single analysis?
- How are the variables measured?
Philosophy of multivariate analysis:
Establish practical and statistical significance.
- Know your data.
- Strive for model parsimony.
- Look at errors.
- Validate results.
Step-by-step multivariate model building:
1. Define research problem, objectives and multivariate technique to be used.
2. Develop analysis plan – designing.
3. Evaluate assumptions.
4. Estimate multivariate model and evaluate fit.
5. Interpret variates.
6. Validate multivariate model.
CHAPTER 2: EXAMINING YOUR DATA
Graphics -> character of the data
Histogram: shape of variable’s distribution.
Scatterplot: bivariate.
Fourier transformation -> Chernoff face.
Identify missing data and remedy:
, - Determine type and whether to be ignored.
- Determine extent and whether to be deleted.
- Diagnose randomness.
- Select imputation method (valid/calculated).
MCAR -> missing data are randomly distributed and can be remedied without incurring
bias.
MAR -> underlying process results in a bias and remedy must ensure to not incur bias in
the process.
Outliers = observations with a unique combination of characteristics indicating they are
distinctly different from other observations.
Assumptions: normality (bell), homoscedasticity, linearity and absence of correlated
errors must be met.
Data transformations:
- Correct variations
- Improve relationship
Dummy variable = dichotomous variable that has been converted to a metric distribution
and represents one category of non-metric independent variables.
WEEK 2
LECTURE 2: ANOVA
Step-by-step:
1. Differences in mean of metric dependent variable across levels of 1(+) non-metric
independent variable(s) (“factors”).
2. Interactions -> the effect of 1 variable on DV is dependent on another variable, a
2-way interaction.
Control variables/”covariates” affect DV separately form the treatment variables.
Effective covariates will improve the statistical power of the tests and reduce
within-group variance. If unaccounted for, may bias estimates of treatment effects
-> ANCOVA.
3. Independence affects estimates and standard errors, underestimation (or
overestimation) of the variability within groups -> “row”.
“Between-subjects”: each unit of analysis ‘sees’ only one combination of IV’s.
“Within-subjects”: each unit of analysis sees ALL possible treatments. Each
compared with itself – control for “luck of draw”. Higher “power”.
Equality of variance (homoscedasticity) affects standard errors.
Levene’s test: H0: variances are equal (homoscedasticity)
H1: at least one group has a different variance.
Follows F-distribution, if (Pr(>F)) >= a, fail to reject the null hypothesis.
If homoscedasticity is violated:
Sample size similar across treatment groups -> robust Robust = reliable.
transform dependent variable (natural logarithm) – small samples.
Add covariate (ANCOVA)
Normality (distribution of residuals) affects your standard errors only is sample is
small.
Residuals need to be normally distributed !
Kolmogorov-Smirnov test
Shapiro-Wilk test
H0 = normal distribution. – p<a : normal distribution.
4. 1-way ANOVA