Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Overig

ISYE 7406 Data Mining Homework 3 | Actual Study complete Solutions | A+ Graded | 2026 Updates | 100% correct

Beoordeling
-
Verkocht
-
Pagina's
18
Geüpload op
24-04-2026
Geschreven in
2025/2026

ISYE 7406 Data Mining Homework 3 | Actual Study complete Solutions | A+ Graded | 2026 Updates | 100% correct

Instelling
Vak

Voorbeeld van de inhoud

ISYE 7406 – Homework 3: MPG



02/17/2026

Introduction

Fuel efficiency remains one of the most important characteristics of modern vehicles, affecting
consumer purchasing decisions, environmental impact, and manufacturing strategy. The Auto
MPG dataset provides historical automobile data including engine characteristics, vehicle
weight, and production year. The objective of this analysis is to classify vehicles as having high
or low gas mileage based on these attributes.
Specifically, we construct a binary response variable, mpg01, equal to 1 if a vehicle’s mpg
exceeds the median and 0 otherwise. Multiple classification methods are applied and compared
using cross-validation to determine which method performs best and to understand which vehicle
features most strongly influence fuel efficiency.
Exploratory Data Analysis (EDA)

The cleaned dataset contains 392 observations after removing missing values and excluding the
vehicle name column. The correlation matrix indicates strong relationships between mpg01 and
several predictors.

Variable Correlation with mpg01
Cylinders -0.76
Displacement -0.75
Weight -0.76
Horsepower -0.67
Origin 0.51
Year 0.43
Acceleration 0.35

The strongest negative correlations are with weight, cylinders, and displacement, suggesting
that larger and heavier vehicles are substantially more likely to fall into the low-mpg category.

, Figure 1. Boxplot of Weight by mpg01

The boxplot of vehicle weight by mpg01 (Figure 1) shows clear separation between classes.
High-mpg vehicles (mpg01 = 1) are significantly lighter, with noticeably lower median weight
and tighter spread compared to low-mpg vehicles.




Figure 2. Scatterplot (MPG vs Weight)

The scatterplot of mpg versus weight (Figure 2) further confirms a strong nonlinear negative
association: as weight increases, mpg decreases sharply. This relationship visually supports the
large negative correlation observed in the table.

, The correlation matrix reveals that mpg01 is most strongly associated with vehicle weight (-
0.76), cylinders (-0.76), and displacement (-0.75). These large negative correlations indicate that
heavier vehicles with larger engines are substantially more likely to fall below the median mpg
threshold. Horsepower also exhibits a strong negative relationship (-0.67), further reinforcing the
role of engine size and power in determining fuel efficiency. In contrast, origin (0.51) and model
year (0.43) show moderate positive associations, suggesting improvements in efficiency over
time and across manufacturing regions.
These relationships are visually reinforced by Figure 1 and Figure 2. The boxplot of weight by
mpg01 shows pronounced class separation, with high-mpg vehicles exhibiting significantly
lower median weight and less dispersion. The scatterplot of mpg versus weight demonstrates a
clear nonlinear decreasing trend, indicating that fuel efficiency declines rapidly as vehicle mass
increases. Together, these findings justify selecting cylinders, displacement, horsepower, weight,
and origin as predictors for classification.
Methodology

To obtain robust performance estimates, repeated random splitting (Monte Carlo cross-
validation) was performed over 100 iterations. In each iteration, approximately 10% of the data
was randomly selected as a test set, with the remaining observations used for training.
The following classification methods were evaluated:
1. Linear Discriminant Analysis (LDA)
2. Quadratic Discriminant Analysis (QDA)
3. Naive Bayes
4. Logistic Regression
5. K-Nearest Neighbors (KNN, k = 3)
Performance was measured using misclassification error on both training and test sets.
Results

One-Split Model Performance

For a representative 90/10 train-test split, testing errors were:

Method Test Error
LDA 0.128
QDA 0.103
Naive Bayes 0.103
Logistic 0.103
KNN (k=3) 0.103

Geschreven voor

Instelling
Vak

Documentinformatie

Geüpload op
24 april 2026
Aantal pagina's
18
Geschreven in
2025/2026
Type
OVERIG
Persoon
Onbekend

Onderwerpen

$15.99
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF


Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
EduSprint Chamberlain College Of Nursing
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
50
Lid sinds
2 jaar
Aantal volgers
5
Documenten
6810
Laatst verkocht
3 dagen geleden
Elite Nursing Exams Hub

WGU A+ Vault fore more info

4.3

6 beoordelingen

5
4
4
0
3
2
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen