Foundations of Business Intelligence (California State
University, Fullerton)
ISYE 6501 10
February 2025
Homework 5
Question 8.1
Describe a situation or problem from your job, everyday life, current events, etc., for which a
linear regression model would be appropriate. List some (up to 5) predictors that you might
use.
Linear regression can be used in determining the time we spend on multiple grocery
trips a week. To calculate the total grocery trip time in minutes, there are a few predictors that
we need to take into consideration:
1. Time stuck in the traffic (unit: minutes): The heavier the traffic, the longer it takes to
get toget to the market and back
2. Number of items on the shopping list: more items means a longer time spent in the
store looking for exact items or substitution
3. Distance to the store (unit: miles): the further the store, the longer it takes to travel4.
Particular day of the week: weekends tend to be more crowded at the grocery stores,
increasing shopping time. Using binary variables: 1 is weekend, 0 is weekday.
5. Checkout line length: if there are more people waiting in line, it takes longer for your turn
to be checked out.
The equation will look like below:
Total trip time = b0 + b1(Traffic time) + b2(Items) + b3(Distance) + b4(Weekend) +
b5(Checkout line)
b0: the baseline time it gets to get grocery b1, b2, b3, b4, b5: the coefficients representing
each predictors that affect the total trip time
Question 8.2
Using crime data from http://www.statsci.org/data/general/uscrime.txt (file uscrime.txt,
description at http://www.statsci.org/data/general/uscrime.html ), use regression (a useful R
function is lm or glm) to predict the observed crime rate in a city with the following data: M
= 14.0
So = 0
Ed = 10.0
, Po1 = 12.0
Po2 = 15.5
LF = 0.640
M.F = 94.0 Pop = 150
NW = 1.1
U1 = 0.120
U2 = 3.6
Wealth = 3200
Ineq = 20.1
Prob = 0.04
Time = 39.0
Show your model (factors used and their coefficients), the software output, and the quality of
fit.
Note that because there are only 47 data points and 15 predictors, you’ll probably notice
some overfitting. We’ll see ways of dealing with this sort of problem later in the course.
> #Question 8.2
> #first we read the data in the directory
> data = read.table("/Users/joeytran/hw5/uscrime.txt", header = TRUE)
> summary(data)
M So Ed Po1
Po2 LF M.F Pop NW
Min. :11.90 Min. :0.0000 Min. : 8.70 Min. : 4.50 Min.
: 4.100 Min. :0.4800 Min. : 93.40 Min. : 3.00 Min.
: 0.20
1st Qu.:13.00 1st Qu.:0.0000 1st Qu.: 9.75 1st Qu.: 6.25 1st Qu.:
5.850 1st Qu.:0.5305 1st Qu.: 96.45 1st Qu.: 10.00 1st Qu.:
2.40 Median :13.60 Median :0.0000 Median :10.80 Median
: 7.80
Median : 7.300 Median :0.5600 Median : 97.70 Median : 25.00
Median : 7.60
Mean :13.86 Mean :0.3404 Mean :10.56 Mean : 8.50
Mean : 8.023 Mean :0.5612 Mean : 98.30 Mean : 36.62 Mean
:10.11
3rd Qu.:14.60 3rd Qu.:1.0000 3rd Qu.:11.45 3rd Qu.:10.45 3rd
Qu.: 9.700 3rd Qu.:0.5930 3rd Qu.: 99.20 3rd Qu.: 41.50 3rd
Qu.:13.25
Max. :17.70 Max. :1.0000 Max. :12.20 Max. :16.60 Max.
:15.700 Max. :0.6410 Max. :107.10 Max. :168.00 Max.
:42.30
U1 U2 Wealth Ineq
Prob Time Crime
Min. :0.07000 Min. :2.000 Min. :2880 Min. :12.60
Min. :0.00690 Min. :12.20 Min. : 342.0
1st Qu.:0.08050 1st Qu.:2.750 1st Qu.:4595 1st Qu.:16.55 1st
Qu.:0.03270 1st Qu.:21.60 1st Qu.: 658.5
Median :0.09200 Median :3.400 Median :5370 Median :17.60
Median :0.04210 Median :25.80 Median : 831.0