Overig

ISYE 7406 Data Mining Homework 3 | Actual Study complete Solutions | A+ Graded | 2026 Updates | 100% correct

Beoordeling

Verkocht

Pagina's

Geüpload op

24-04-2026

Geschreven in

2025/2026

ISYE 7406 Data Mining Homework 3 | Actual Study complete Solutions | A+ Graded | 2026 Updates | 100% correct

Instelling

Vak

Voorbeeld van de inhoud

ISYE 7406 – Homework 3: MPG

02/17/2026

Introduction

Fuel efficiency remains one of the most important characteristics of modern vehicles, affecting
consumer purchasing decisions, environmental impact, and manufacturing strategy. The Auto
MPG dataset provides historical automobile data including engine characteristics, vehicle
weight, and production year. The objective of this analysis is to classify vehicles as having high
or low gas mileage based on these attributes.
Specifically, we construct a binary response variable, mpg01, equal to 1 if a vehicle’s mpg
exceeds the median and 0 otherwise. Multiple classification methods are applied and compared
using cross-validation to determine which method performs best and to understand which vehicle
features most strongly influence fuel efficiency.
Exploratory Data Analysis (EDA)

The cleaned dataset contains 392 observations after removing missing values and excluding the
vehicle name column. The correlation matrix indicates strong relationships between mpg01 and
several predictors.

Variable Correlation with mpg01
Cylinders -0.76
Displacement -0.75
Weight -0.76
Horsepower -0.67
Origin 0.51
Year 0.43
Acceleration 0.35

The strongest negative correlations are with weight, cylinders, and displacement, suggesting
that larger and heavier vehicles are substantially more likely to fall into the low-mpg category.

, Figure 1. Boxplot of Weight by mpg01

The boxplot of vehicle weight by mpg01 (Figure 1) shows clear separation between classes.
High-mpg vehicles (mpg01 = 1) are significantly lighter, with noticeably lower median weight
and tighter spread compared to low-mpg vehicles.

Figure 2. Scatterplot (MPG vs Weight)

The scatterplot of mpg versus weight (Figure 2) further confirms a strong nonlinear negative
association: as weight increases, mpg decreases sharply. This relationship visually supports the
large negative correlation observed in the table.

, The correlation matrix reveals that mpg01 is most strongly associated with vehicle weight (-
0.76), cylinders (-0.76), and displacement (-0.75). These large negative correlations indicate that
heavier vehicles with larger engines are substantially more likely to fall below the median mpg
threshold. Horsepower also exhibits a strong negative relationship (-0.67), further reinforcing the
role of engine size and power in determining fuel efficiency. In contrast, origin (0.51) and model
year (0.43) show moderate positive associations, suggesting improvements in efficiency over
time and across manufacturing regions.
These relationships are visually reinforced by Figure 1 and Figure 2. The boxplot of weight by
mpg01 shows pronounced class separation, with high-mpg vehicles exhibiting significantly
lower median weight and less dispersion. The scatterplot of mpg versus weight demonstrates a
clear nonlinear decreasing trend, indicating that fuel efficiency declines rapidly as vehicle mass
increases. Together, these findings justify selecting cylinders, displacement, horsepower, weight,
and origin as predictors for classification.
Methodology

To obtain robust performance estimates, repeated random splitting (Monte Carlo cross-
validation) was performed over 100 iterations. In each iteration, approximately 10% of the data
was randomly selected as a test set, with the remaining observations used for training.
The following classification methods were evaluated:
1. Linear Discriminant Analysis (LDA)
2. Quadratic Discriminant Analysis (QDA)
3. Naive Bayes
4. Logistic Regression
5. K-Nearest Neighbors (KNN, k = 3)
Performance was measured using misclassification error on both training and test sets.
Results

One-Split Model Performance

For a representative 90/10 train-test split, testing errors were:

Method Test Error
LDA 0.128
QDA 0.103
Naive Bayes 0.103
Logistic 0.103
KNN (k=3) 0.103

Meld schending auteursrecht

Geschreven voor

Instelling: Georgia Institute Of Technology
Vak: ISYE 7406

Alle documenten voor dit vak (82)

Documentinformatie

Geüpload op: 24 april 2026
Aantal pagina's: 18
Geschreven in: 2025/2026
Type: OVERIG
Persoon: Onbekend

Onderwerpen

cross validation results
superior perfomance
cosumer purchasing decisions

$15.99

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

EduSprint

4.3

(6)

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

EduSprint Chamberlain College Of Nursing

Bekijk profiel

Volgen

Verkocht

Lid sinds

2 jaar

Aantal volgers

Documenten

6810

Laatst verkocht

3 dagen geleden

Elite Nursing Exams Hub

WGU A+ Vault fore more info

4.3

6 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper EduSprint. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $15.99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 47347 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

ISYE 7406 Data Mining Homework 3 | Actual Study complete Solutions | A+ Graded | 2026 Updates | 100% correct

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?