PROJECT 1 - PHAM BAO NGUYEN - 45545061
1
a) How to load cps4_small.dta
install.packages("haven")
library(haven)
CPS4 = read_dta("cps4_small.dta" )
, b) Obtain summary statistics and histograms for the variables wage and educ.
Wage
R-command:
histogram(wage,
main = "Earnings per hour",
xlab = "wage per hour",
ylab = "Percentage")
Describe(wage)
Summary(wage)
The sample mean of wage is $20.62. The data is significant positive skewed (1.58), indicating
that the tail is on the right side of the distribution, which extends towards more positive
values.
The standard deviation is 12.83 while kurtosis is 2.91. This means that although the standard
deviation is significantly high, there is fewer extreme outliers than the normal distribution
and other variables. In particular, the value of kurtosis less than 3 could be considered as
playkurtic, indicating the wage’s data has less exceptions which can deflect the assumption
for year of education.
, Education
histogram(educ,
main = "Year of Education",
xlab = "year",
ylab = "Number of observations")
describe(educ)
summary(educ)
The sample mean of education is 13.8 years. The data is significantly negative skewed (-
0.07), indicating that the tail is on the left side of the distribution, which extends towards
more negative values.
The standard deviation is 2.71 while kurtosis is 2.06. This means that the standard deviation
is not very high, there is fewer extreme outliers than the normal distribution and other
variables. In particular, the value of kurtosis less than 3 could be considered as playkurtic.
1
a) How to load cps4_small.dta
install.packages("haven")
library(haven)
CPS4 = read_dta("cps4_small.dta" )
, b) Obtain summary statistics and histograms for the variables wage and educ.
Wage
R-command:
histogram(wage,
main = "Earnings per hour",
xlab = "wage per hour",
ylab = "Percentage")
Describe(wage)
Summary(wage)
The sample mean of wage is $20.62. The data is significant positive skewed (1.58), indicating
that the tail is on the right side of the distribution, which extends towards more positive
values.
The standard deviation is 12.83 while kurtosis is 2.91. This means that although the standard
deviation is significantly high, there is fewer extreme outliers than the normal distribution
and other variables. In particular, the value of kurtosis less than 3 could be considered as
playkurtic, indicating the wage’s data has less exceptions which can deflect the assumption
for year of education.
, Education
histogram(educ,
main = "Year of Education",
xlab = "year",
ylab = "Number of observations")
describe(educ)
summary(educ)
The sample mean of education is 13.8 years. The data is significantly negative skewed (-
0.07), indicating that the tail is on the left side of the distribution, which extends towards
more negative values.
The standard deviation is 2.71 while kurtosis is 2.06. This means that the standard deviation
is not very high, there is fewer extreme outliers than the normal distribution and other
variables. In particular, the value of kurtosis less than 3 could be considered as playkurtic.