Exam (elaborations)

Data Science Student Solutions Manual – Step-by-Step Answers & Learning Guide (2026)

Rating

Sold

Pages

Grade

A+

Uploaded on

25-04-2026

Written in

2025/2026

The DataScience_SSM (Student Solutions Manual) provides detailed step-by-step solutions and explanations for exercises in a data science textbook. It helps learners understand how to apply statistical, computational, and analytical methods used in modern data science. This resource is ideal for students, instructors, and self-learners who want structured guidance in data analysis, modeling, and interpretation. Introduction to data science workflows Data collection and cleaning Exploratory data analysis (EDA) Probability and statistical reasoning Data visualization techniques Regression and predictive modeling Machine learning fundamentals Interpretation of data results Python/R-based analytical thinking (conceptual) Step-by-step solutions to exercises Clear explanations of data science methods Supports coursework and exam preparation Covers foundational data science concepts Data Science Student Solutions Manual (SSM) – Step-by-Step Answers and Explanations The Data Science Student Solutions Manual provides detailed solutions and explanations for textbook exercises. It covers key topics such as data analysis, probability, visualization, regression, and machine learning fundamentals, helping students build strong analytical and problem-solving skills in data science.

Show more Read less

Institution

Computer Science

Course

Computer Science

Content preview

https://www.stuvia.com/user/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub

https://www.stuvia.com/uaser/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub

,https://www.stuvia.com/user/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub

Principles of Data Science

Chapter 1
What Are Data and Data Science?

Chapter Review
[1.1, LO 1.1.1, 1.1.2]
1. Select the incorrect step and goal pair of the data science cycle.
a. Data collection: collect the data so that you have something for analysis.
b. Data preparation: have the collected data stored in a server as is so that you can start
the analysis.
c. Data analysis: analyze the prepared data to retrieve some meaningful insights.
d. Data reporting: present the data in an effective way so that you can highlight the
insights found from the analysis.

Solution: b. Data preparation: have the collected data stored in a server as is so that you can
start the analysis.
Rarely is collected data already in good shape for analysis. Most of the time, collected data
needs to be processed to be suitable for the analysis of interest. An example of preparation can
be dealing with missing data—removing them or filling them.

[1.2, LO 1.2.1]
3. Which of the following best exemplifies the interdisciplinary nature of data science in various
fields?
a. A historian traveling to Italy to study ancient manuscripts to uncover historical insights
about the Roman Empire
b. A mathematician solving complex equations to model physical phenomena
c. A biologist analyzing a large dataset of genetic sequences to gain insights about the
genetic basis of diseases
d. A chemist synthesizing new compounds in a laboratory

Solution: c. A biologist analyzing a large dataset of genetic sequences to gain insights about the
genetic basis of diseases
Traditionally, biologists would conduct lab experiments to answer questions in their field;
however, nowadays data science is being used to analyze large datasets to extract valuable
information that can shed light on complex topics such as the genetic basis of diseases. Option
a) is incorrect as studying primary sources does not inherently involve data science. Option b) is

11/11/24 For more free, peer-reviewed, openly licensed resources visit OpenStax.org. 2

https://www.stuvia.com/uaser/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub

,https://www.stuvia.com/user/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub

Principles of Data Science

incorrect as solving equations is not in the domain of data science. Option d) is incorrect as it
describes the traditional work of a chemist as a lab scientist.

Critical Thinking
[1.3, LO 1.3.4]
1. For each dataset, list the attributes.
a. Spotify dataset
b. CancerDoc dataset

Solution a: Following is the list of attributes in the Spotify dataset:
track_name, artist(s)_name, artist_count, released_year, released_month, released_day,
in_spotify_playlists, in_spotify_charts, streams, in_apple_playlists, in_apple_charts,
in_deezer_playlists, in_deezer_charts, in_shazam_charts, bpm, key, mode, danceability_%,
valence_%, energy_%, acousticness_%, instrumentalness_%, liveness_%, speechiness_%
Solution b: The CancerDoc dataset has three attributes; however, none of these attributes have
a clear name. They are: the column with numeric identifiers (the first column), the column with
cancer type (the second column), and the actual text (the third column).

[1.3, LO 1.3.2]
3. For each dataset, identify the type of the dataset—structured vs. unstructured. Explain why.
a. Spotify dataset
b. CancerDoc dataset

Solution a: The Spotify dataset is a structured dataset since each item in the dataset is in a
same form.
Solution b: The CancerDoc dataset is an unstructured dataset since the third column is the main
information while the first and second columns serve as labels of each entry (i.e., used to
distinguish each item in the dataset). The third column is a free-form text, so this dataset is
unstructured.

[1.3, LO 1.3.4]
5. Open the WikiHow dataset (ch1-wikiHow.json) and list the attributes of the dataset.
Solution: The ch1-wikiHow.json file has a list of items in an array (i.e., [ ]). Each array has an
object (i.e., { }) in which there are nine attributes total. The attributes are: “Time”, “URL”,
“MainTask”, “MainTaskSummary”, “Steps”, “Categories”, “Ingredients”, “Requirements”, and
“Tips”.
Note that some attributes have data in the form of an array as well. For example, “Steps” is an
array of which each element is also an object with three fields—“Headline”, “Description”, and
“Links”.

11/11/24 For more free, peer-reviewed, openly licensed resources visit OpenStax.org. 3

https://www.stuvia.com/uaser/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub

, https://www.stuvia.com/user/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub

Principles of Data Science

[1.5, LO 1.5.3]
7. Regenerate the scatterplot of the Spotify dataset, but with a custom title and x-/y-axis label.
The title should be “BPM vs. Danceability.” The x-axis label should be titled “bpm” and range
from the minimum to the maximum bpm value. The y-axis label should be titled “danceability”
and range from the minimum to the maximum Danceability value.
a. Python Matplotlib (Hint: DataFrame.min() and DataFrame.max() methods
return min and max values of the DataFrame. You can call these methods upon a specific
column of a DataFrame as well. For example, if a DataFrame is named df and has a
column named “col1”, df[“col1”].min() will return the minimum value of the
“col1” column of df. )
b. A spreadsheet program such as MS Excel or Google Sheets (Hint: Calculate the minimum
and maximum value of each column somewhere else first, then simply use the value
when editing the scatterplot.)
Solution a: The following code draws the same scatterplot with the custom title and axis labels.

import matplotlib.pyplot as plt
plt.scatter(data["bpm"], data["danceability_%"]) # draw the scatterplot
plt.title("BPM vs. Danceability") # set the title

plt.xlabel("BPM") # set the x-axis label
plt.xlim(data["bpm"].min(), data['bpm'].max()) # set the range of the axis

# set the y-axis label and its range of values
plt.ylabel("Danceability (%)")
plt.ylim(data["danceability_%"].min(), data['danceability_%'].max())

plt.show()

Solution b: (This solution is based on MS Excel.) You can edit the chart title by double-clicking
the title text. A cursor will show up, and you can edit the title text. The axis labels can be added
by clicking Chart Design > Add Chart Element > Axis Titles. Primary Vertical and Primary
Horizontal will add a text box for the x- and y-axes, respectively. You can edit the text boxes by
double-clicking them.

To set the range of the values to be related to the minimum and maximum values of the bpm
and danceability column, on Excel you need to calculate those values first. You can do so by
using =MIN() and =MAX() on each column. Note those values somewhere and use them in the
text boxes under Format Axis > Axis Options > Bounds. You can open the Format Axis menu by
either 1) double-clicking the axis elements or 2) right-clicking the axis elements and then
selecting Format Axis….

11/11/24 For more free, peer-reviewed, openly licensed resources visit OpenStax.org. 4

https://www.stuvia.com/uaser/openstaxstudyhub https://www.stuvia.com/user/openstaxstudyhub

Report Copyright Violation

Written for

Institution: Computer Science
Course: Computer Science

Document information

Uploaded on: April 25, 2026
Number of pages: 70
Written in: 2025/2026
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

data science student solutions manual
data science step by step answers
data analysis solutions guide
statistics and machine learning solutions
data science homework help
data analysis and modeling st

$18.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

OpenStaxStudyHub

Get to know the seller

OpenStaxStudyHub Amg School Of Licensed Practical Nursing

View profile

Sold

Member since

6 months

Number of followers

Documents

101

Last sold

3 weeks ago

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller OpenStaxStudyHub. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $18.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47251 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Data Science Student Solutions Manual – Step-by-Step Answers & Learning Guide (2026)

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?