Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

Data Science – Student Solutions Manual (SSM) – Chapter 6 – Complete Worked Solutions & Study Resource

Rating
-
Sold
1
Pages
6
Grade
A+
Uploaded on
20-04-2026
Written in
2025/2026

This document covers Chapter 06 of the Data Science Study Support Material (SSM). It focuses on applied data analysis techniques and practical approaches to working with datasets in data science. It is designed to help reinforce understanding of analytical methods and improve practical data handling skills for revision and exam preparation.

Show more Read less
Institution
Principles Of Data Science
Course
Principles of Data Science

Content preview

, Principles of Data Science



Chapter 6
Chapter 6
Decision-Making Using Machine Learning Basics


Chapter Review
[6.2, LO 6.2.1]
1. You are working with a dataset containing information about customer purchases at an
online retail store. Each data point represents a customer and includes features such as age,
gender, location, browsing history, and purchase history. Your task is to segment the customers
into distinct groups based on their purchasing behavior in order to personalize marketing
strategies. Which of the following machine learning techniques is best suited for this scenario?
a. linear or multiple linear regression
b. logistic or multiple logistic regression
c. k-means clustering
d. naïve Bayes classification

Solution: c. k-means clustering
This is a clustering problem. K-means clustering is the best choice. Regression techniques
cannot be used with non-numerical features in the data (such as gender, browsing history, and
purchase history). The data is unlabeled since we do not already know the groups that
customers will be classified into, so naïve Bayes classification is not appropriate.


Critical Thinking
[6.1, LO 6.1.2, 6.1.4]
1. Discuss how different ratios of training versus testing data can affect the model in terms of
underfitting and overfitting. How does the testing set provide a means to identify issues with
underfitting and overfitting?

Solution: When a model is trained on a large proportion of the dataset (for example, 90%
training and 10% testing), the model may pick up on more details in the dataset, giving it high
accuracy on the training set. If those details are due to random noise or outliers, the model may
be prone to overfitting in this case.
When a model is trained on a small proportion of the dataset (for example, 50% training and
50% testing), the model may not see enough training data to learn complex relationships that
do exist in the dataset. So, in this case, the model is prone to underfitting.
If the model’s accuracy is significantly lower on the testing set, this indicates an issue with
either underfitting or overfitting. However, in the case of a large train/test ratio, the testing set
may be too small to evaluate the model adequately. This is why it is important to use a
substantial testing set in most machine learning algorithms.



11/11/24 For more free, peer-reviewed, openly licensed resources visit OpenStax.org. 2

Written for

Institution
Principles of Data Science
Course
Principles of Data Science

Document information

Uploaded on
April 20, 2026
Number of pages
6
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Free
Get access to the full document:
Download

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
ACTUALSTUDY Chamberlain School Of Nursing
Follow You need to be logged in order to follow users or courses
Sold
2467
Member since
2 year
Number of followers
218
Documents
38900
Last sold
15 hours ago

4.7

680 reviews

5
570
4
53
3
29
2
10
1
18

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions