Information
The data set "cars.csv" was introduced in Chapter 9 of the book. This data set contains
information on various
models of cars. Download this le to your computer and store it in the working directory of R. The
le can be
downloaded from MATH 1281 Data Files in the course page for MATH 1281.
The variable "width" in the data set contains the width of the body of the car in inches. The
variable "length"
contains the length of the body of the car in inches. Produce a new variable by the name "area"
that contains the
area of the body of the car. This variable should be produced by multiplying the the width of the
body by the
length of the body. Answer the following questions with respect to the variable "area”.
Question 1
Correct Mark 1.00 out of 1.00
The variable "area" is a:
Select one:
a. Numeric
b. Factor
Upload the le "cars.csv" to a data set by the name "cars" using the code: cars <-
read.csv("cars.csv") . Form the variable
"area" using the code: area <- cars$length*cars$width . The variable "area" is a
product of two numeric variables and it
obtains numeric values as well.
,Question 2
Correct Mark 1.00 out of 1.00
The average area of the body of a car is:
Answer:
11493.36
Apply the code: mean(area) . The sample average turn out to be equal to 11493.36.
Alternatively, you may apply the code:
summary(area) and obtain the value of the mean from the produced report.
Question 3
Correct Mark 1.00 out of 1.00
The median area of the body of a car is:
Answer:
11314.2
Apply the code: median(area) . The sample median turn out to be equal to 11314.2.
Alternatively, you may apply the code:
summary(area) and obtain the value of the median from the produced report.
Question 4
Correct Mark 1.00 out of 1.00
The interquartile range of the area is:
Answer:
1566.18
Apply the code: quantile(area,c(0.25,0.75)) . The rst quartile (Q1) is equal to
10709.72 and the third quartile (Q3) is equal
to 12275.90. The dierence between Q3 and Q1 is the interquartile range and is
equal to IQR = Q3 - Q1 = 1566.18.
Alternatively, you may apply the code: summary(area) and obtain the values of Q1
and Q3 from the produced report.
Question 5
Correct Mark 1.00 out of 1.00
,The number of outlier observations in the variable "area" is:
Select one:
a. 0
b. 1
c. 2
d. 3
e. More than 3
You may produce the box-plot of the variable using the code: boxplot(area) . One
point in the upper part of the plot
represents the outliers. In order to make sure that this point represents a single
data point you may produce the table with
the code: table(area) . The frequency of the largest value is 1.
An alternative method for counting the number of data points above the threshold
Q3 + 1.5 (Q3-Q1) may use the code:
sum(12275.9 + 1.5*1566.18 < area) . The count is 1.
Question 6
Correct Mark 1.00 out of 1.00
The number of missing observations in the variable "area" is:
Select one:
a. 0
b. 1
c. 2
d. 3
There are no missing values in this variable.
Information
A rectangular body is square if the length is equal to the width. Consider, for each
car, the dierence between the
length and the width of the body. Call this variable X. If cars tend to be square then
we would expect the
expectation of this variable to be equal to 0.
, In Chapter 12 we will present a statistical test for testing that the expectation of a
variable is equal to 0. The test
statistic for this problem is given by: T = X/[S/ √n], where X is the sample average, S
is the sample standard
deviation, and n is the sample size.
In the following questions you are required to determine the region that contains
95% of the values of the test
statistic if indeed cars tend to be square in their shape and to check whether or not
the evaluation of the test
statistic for the "cars" data set falls within this region.
Question 7
Correct Mark 1.00 out of 1.00
Let X , X , ..., X be a sample of Normal random variables with expectation μ=0 and
variance σ=100. (Namely,
standard deviation of σ=10.) The region that contains 95% of the distribution of the
statistic T = X/[S/ √205] is best
described by:
Select one:
a. [-19.7,19.7]
b. [88.3,127.7]
c. [-1.97,1.97]
d. [106,110]
The interval that contains 95% of the sampling distribution of the test statistic may
be identied via simulations. Run the code:
mu <- 0
sig <- 10
test.stat <- rep(0,10^5)
for(i in 1:10^5)
{
X <- rnorm(205,mu,sig)
test.stat[i] <- (mean(X))/sqrt(var(X)/205)