Week 1
Random variable: X = variable that takes on different values (xi) with a given probability
(Pr(X=xi))
- 1) discrete: finite number of outcomes (countable)
- 2) continuous: any numerical value (measuring process)
Population: set of all possible outcomes of X
Probability density function (pdf): function containing probabilities of different outcomes,
denoted: f(xi) = Pr(X = xi)
- all outcomes have a non-zero probability
- the sum of all probabilities is equal to 1
Expected value: average value/mean of (sample) population
Expected value calculation rules:
1) Constant: E(c) = c
2) Constant added to X: E(X+c) = E(X) + c = µx + c
3) Constant multiplied: E(cX) = cE(X) = cµx
4) Sum: E(X1+X2) = E(X1) + E(X2) = µx1 + µx2
Variance (σ2) = E(X-EX)2 = Σ(xi - µx)2Pr(X=xi) = E(X2) - µx2
Standard deviation (σ) = √Var(X)
Variance calculation rules:
1. Var(c) = 0
2. Var(X+c) = Var(X)
3. Var(cX) = c2Var(X)
4. Independent variables: Var(X1+X2) = Var(X1) + Var(X2)
5. Dependent variables: Var(X1+X2) = Var(X1) + Var(X2) + 2Cov(X1X2)
Joint distribution: probability that variables take on certain values simultaneously.
Pr(X=xi, G=gj)
- all outcomes have non-zero probability
- the sum of all probabilities is equal to 1
Marginal distribution: probability that one variable takes on certain values (same as pdf)
Conditional distribution: probability that variable takes on certain values, conditional on a
specific of another random variable Pr (X|G=gi
Independence: - joint distribution is equal to the product of marginal distributions
- conditional distribution is equal to the marginal distribution
- covariance and correlation between variables are 0
,Conditional expectations calculation rules:
1. Constant: E(cX |G =gi) = cE(X|G=gi)
2. Function: E(h(G)X|G=gi) = h(gi)E(X|gi)
Covariance: measure of linear association between two variables.
Cov(X,G) = σxg = E(X-EX)G-EG) = E(XG) – E(X)E(G)
Covariance rules of calculation:
1. Constant: Cov (X,c) = 0
2. Cov(aX,bG) = abCov(X,G)
3. Cov(X,X) = E(X-EX)2 = Var(X)
Correlation: scale-measure of linear association between two variables
Corr(X,G) = Cov(X,G)/(sdX*sdG)
Sample:
- mean: Σxi / n
- variance: s2 = Σ(xi – mx)/(n-1)
- covariance: sxg = Σ(xi – mx)(gi – mg)/(n-1)
- correlation: rxg = sxg / sdx*sdg
Week 2
OLS: Ordinary Least Squares
minimize sum of squared deviations from regression line and find β i
take derivatives with respect to βi and set equal to 0
R-squared: measure of goodness of fit of the regression (R2)
= fraction of total variation in dependent variable explained by variation in dependent
variable(s) = 1 – RSS/TSS = ESS/TSS
Root mean squared error: measure of the average size of the residual
√MSE = √(Σei2/(n-k-1))
n – k – 1: number of degrees of freedom
Inference: using estimates of random sample to say something about relationships in
unoserved population.
Assumptions of unbiasedness of OLS:
1) The population model is linear in parameters and the error term is additive
Yi = b0 + biXi + ei
2) The error term has a zero population mean: E(εi) = 0
constant b0 will always absorb this mean
3) The independent variables are uncorrelated with the error term: E(εi|Xi…) = 0
omitted variable bias undermines this
, 4) There is no perfect (multi)collinearity between independent variables and no variable is a
constant
no variable is a multiplication or the addition/subtraction of other variables
1-4 hold: unbiased estimator of βi
5) There is no serial correlation; errors are not correlated with each other: Corr(εi,εj) = 0
6) There is no heteroskedasticity; the error term has constant variance
5-6 hold: unbiased estimator of variance (and t-test/significance testing is possible)
Unbiasedness: an OLS estimator is unbiased if the expected value of the sample estimators
equal the population parameters (mean, variance, etc.)
they are not necessarily equal (but their mean is)
Consistency: the estimator converges to the population parameter as the size of the sample
increases
Variance(βj) = σ2/(1-Rj2)TSSxj
The larger the variance, the more ‘sampling uncertainty’
- smaller σ2, less error, leads to lower variance (σ2 = Σei2/(n-k-1)
- more TSSxj – variation in the variable, leads to lower variance
- smaller Rj2 (auxiliary regression) – less variation in variable shared with others
(multicollinearity) – leads to lower variance
Week 3
Assumption 7: normality of error term
- important for small samples (large samples follow Central Limit Theorem)
Using t-statistic:
1) Formulate H0 and HA
2) Choose significance level
3) Calculate t-statistic t = (b^ - b) / se(b^)
4) Find critical value from t-distribution
- n-k-1 degrees of freedom
- take a for one-sided HA, a/2 for two-sided HA
P-value: probability that both t-statistic and H0 hold. (probability of Type I error)
STATA: p-values are two sided (divide by 2 for one sided hypothesis)
Confidence interval (of 1 – a): CI = b^ +/- t * se(b^)
- reject H0 if t doesn’t li within CI
Type I error: H0 rejected, but true for population
Type II error: H0 not rejected, but HA true for population
Random variable: X = variable that takes on different values (xi) with a given probability
(Pr(X=xi))
- 1) discrete: finite number of outcomes (countable)
- 2) continuous: any numerical value (measuring process)
Population: set of all possible outcomes of X
Probability density function (pdf): function containing probabilities of different outcomes,
denoted: f(xi) = Pr(X = xi)
- all outcomes have a non-zero probability
- the sum of all probabilities is equal to 1
Expected value: average value/mean of (sample) population
Expected value calculation rules:
1) Constant: E(c) = c
2) Constant added to X: E(X+c) = E(X) + c = µx + c
3) Constant multiplied: E(cX) = cE(X) = cµx
4) Sum: E(X1+X2) = E(X1) + E(X2) = µx1 + µx2
Variance (σ2) = E(X-EX)2 = Σ(xi - µx)2Pr(X=xi) = E(X2) - µx2
Standard deviation (σ) = √Var(X)
Variance calculation rules:
1. Var(c) = 0
2. Var(X+c) = Var(X)
3. Var(cX) = c2Var(X)
4. Independent variables: Var(X1+X2) = Var(X1) + Var(X2)
5. Dependent variables: Var(X1+X2) = Var(X1) + Var(X2) + 2Cov(X1X2)
Joint distribution: probability that variables take on certain values simultaneously.
Pr(X=xi, G=gj)
- all outcomes have non-zero probability
- the sum of all probabilities is equal to 1
Marginal distribution: probability that one variable takes on certain values (same as pdf)
Conditional distribution: probability that variable takes on certain values, conditional on a
specific of another random variable Pr (X|G=gi
Independence: - joint distribution is equal to the product of marginal distributions
- conditional distribution is equal to the marginal distribution
- covariance and correlation between variables are 0
,Conditional expectations calculation rules:
1. Constant: E(cX |G =gi) = cE(X|G=gi)
2. Function: E(h(G)X|G=gi) = h(gi)E(X|gi)
Covariance: measure of linear association between two variables.
Cov(X,G) = σxg = E(X-EX)G-EG) = E(XG) – E(X)E(G)
Covariance rules of calculation:
1. Constant: Cov (X,c) = 0
2. Cov(aX,bG) = abCov(X,G)
3. Cov(X,X) = E(X-EX)2 = Var(X)
Correlation: scale-measure of linear association between two variables
Corr(X,G) = Cov(X,G)/(sdX*sdG)
Sample:
- mean: Σxi / n
- variance: s2 = Σ(xi – mx)/(n-1)
- covariance: sxg = Σ(xi – mx)(gi – mg)/(n-1)
- correlation: rxg = sxg / sdx*sdg
Week 2
OLS: Ordinary Least Squares
minimize sum of squared deviations from regression line and find β i
take derivatives with respect to βi and set equal to 0
R-squared: measure of goodness of fit of the regression (R2)
= fraction of total variation in dependent variable explained by variation in dependent
variable(s) = 1 – RSS/TSS = ESS/TSS
Root mean squared error: measure of the average size of the residual
√MSE = √(Σei2/(n-k-1))
n – k – 1: number of degrees of freedom
Inference: using estimates of random sample to say something about relationships in
unoserved population.
Assumptions of unbiasedness of OLS:
1) The population model is linear in parameters and the error term is additive
Yi = b0 + biXi + ei
2) The error term has a zero population mean: E(εi) = 0
constant b0 will always absorb this mean
3) The independent variables are uncorrelated with the error term: E(εi|Xi…) = 0
omitted variable bias undermines this
, 4) There is no perfect (multi)collinearity between independent variables and no variable is a
constant
no variable is a multiplication or the addition/subtraction of other variables
1-4 hold: unbiased estimator of βi
5) There is no serial correlation; errors are not correlated with each other: Corr(εi,εj) = 0
6) There is no heteroskedasticity; the error term has constant variance
5-6 hold: unbiased estimator of variance (and t-test/significance testing is possible)
Unbiasedness: an OLS estimator is unbiased if the expected value of the sample estimators
equal the population parameters (mean, variance, etc.)
they are not necessarily equal (but their mean is)
Consistency: the estimator converges to the population parameter as the size of the sample
increases
Variance(βj) = σ2/(1-Rj2)TSSxj
The larger the variance, the more ‘sampling uncertainty’
- smaller σ2, less error, leads to lower variance (σ2 = Σei2/(n-k-1)
- more TSSxj – variation in the variable, leads to lower variance
- smaller Rj2 (auxiliary regression) – less variation in variable shared with others
(multicollinearity) – leads to lower variance
Week 3
Assumption 7: normality of error term
- important for small samples (large samples follow Central Limit Theorem)
Using t-statistic:
1) Formulate H0 and HA
2) Choose significance level
3) Calculate t-statistic t = (b^ - b) / se(b^)
4) Find critical value from t-distribution
- n-k-1 degrees of freedom
- take a for one-sided HA, a/2 for two-sided HA
P-value: probability that both t-statistic and H0 hold. (probability of Type I error)
STATA: p-values are two sided (divide by 2 for one sided hypothesis)
Confidence interval (of 1 – a): CI = b^ +/- t * se(b^)
- reject H0 if t doesn’t li within CI
Type I error: H0 rejected, but true for population
Type II error: H0 not rejected, but HA true for population