Assignment 2 - 5512 - Data I
Felix Moosbauer 01628856
Hannes Rainer 11705682
Benjamin Haußmann 11802103
Philipp Gasser 11712235
Question 1
The data set cig.dta contains information on cigarette consumption and taxes across US
states and years. For this question use the data for the year 2000. Run a regression using
pack_pc (number of cigarette packs per person in a given year) as the dependent variable
(outcome), and state_tax (state cigarette tax in cents) as the independent variable
(covariate): Pack_pc= ß0+ ß1State_tax+E1
(a) What are the estimates for 0 and 1?
ß1 = -0.5950963
ß0 = 108.7307
(b) Interpret the estimated coefficient for 1. Is this estimate statistically different from zero?
Yes it is, because the t-value is -5,80 and therefore bigger than 1,96 in absolute value.
(c) Using the estimates for 0 and 1 what is the predicted average cigarette consumption for a
state with a state tax of 15 cents?
Pack_pc= ß0+ ß1State_tax+E1
Pc = 108.7307 - 0.5950963 x (15)
Pc(15c) = 99.804
(d) Using the estimates for 0 and 1 what is the predicted average cigarette consumption for a
state with a state tax of 25 cents?
The predicted cigarette consumption is on an average 93,853 packets.
(e) Generate the difference between the answers in part (c) and (d). Could you have
calculated it faster by just looking at the Stata output?
Since the difference between 25 and 15 is 10 we can just go ahead and multiply the ß1
factor with 10, which leaves us with an difference of 5,950063.
99,804-93,853 = 5,95..
, (f) What is the R2 in the model? Interpret.
R2 is the Coefficient of determination. It provides a measure of how well observed outcomes are
replicated by the model. How close the values are to the regression line. A value of 0.4073
means that there is a medium sized spread.
(g) What does (Ei) represent in the model?
It represents the regression residual. Ei is the difference between the actual observed values
and the estimated value
Question 2
(a) What variable types do you have in the datasets? Pick a variable of each type and
describe it with graphs/tables.
interval (metric), ordinal, nominal
nominal (male(gender))
Either one is male or female. There are 20 females and 14 men in the dataset
ordinal (mental health) (1 = excellent 5 = poor)
Ordinal (mental health)
Felix Moosbauer 01628856
Hannes Rainer 11705682
Benjamin Haußmann 11802103
Philipp Gasser 11712235
Question 1
The data set cig.dta contains information on cigarette consumption and taxes across US
states and years. For this question use the data for the year 2000. Run a regression using
pack_pc (number of cigarette packs per person in a given year) as the dependent variable
(outcome), and state_tax (state cigarette tax in cents) as the independent variable
(covariate): Pack_pc= ß0+ ß1State_tax+E1
(a) What are the estimates for 0 and 1?
ß1 = -0.5950963
ß0 = 108.7307
(b) Interpret the estimated coefficient for 1. Is this estimate statistically different from zero?
Yes it is, because the t-value is -5,80 and therefore bigger than 1,96 in absolute value.
(c) Using the estimates for 0 and 1 what is the predicted average cigarette consumption for a
state with a state tax of 15 cents?
Pack_pc= ß0+ ß1State_tax+E1
Pc = 108.7307 - 0.5950963 x (15)
Pc(15c) = 99.804
(d) Using the estimates for 0 and 1 what is the predicted average cigarette consumption for a
state with a state tax of 25 cents?
The predicted cigarette consumption is on an average 93,853 packets.
(e) Generate the difference between the answers in part (c) and (d). Could you have
calculated it faster by just looking at the Stata output?
Since the difference between 25 and 15 is 10 we can just go ahead and multiply the ß1
factor with 10, which leaves us with an difference of 5,950063.
99,804-93,853 = 5,95..
, (f) What is the R2 in the model? Interpret.
R2 is the Coefficient of determination. It provides a measure of how well observed outcomes are
replicated by the model. How close the values are to the regression line. A value of 0.4073
means that there is a medium sized spread.
(g) What does (Ei) represent in the model?
It represents the regression residual. Ei is the difference between the actual observed values
and the estimated value
Question 2
(a) What variable types do you have in the datasets? Pick a variable of each type and
describe it with graphs/tables.
interval (metric), ordinal, nominal
nominal (male(gender))
Either one is male or female. There are 20 females and 14 men in the dataset
ordinal (mental health) (1 = excellent 5 = poor)
Ordinal (mental health)