Chapter 1
What Are Data and Data Science?
Chapter Review
[1.1, LO 1.1.1, 1.1.2]
1. Select the incorrect step and goal pair of the data science cycle.
a. Data collection: collect the data so that you have something for analysis.
b. Data preparation: have the collected data stored in a server as is so that you can start
the analysis.
c. Data analysis: analyze the prepared data to retrieve some meaningful insights.
d. Data reporting: present the data in an effective way so that you can highlight the
insights found from the analysis.
Solution: b. Data preparation: have the collected data stored in a server as is so that you can
start the analysis.
Rarely is collected data already in good shape for analysis. Most of the time, collected data
needs to be processed to be suitable for the analysis of interest. An example of preparation can
be dealing with missing data—removing them or filling them.
[1.2, LO 1.2.1]
3. Which of the following best exemplifies the interdisciplinary nature of data science in various
fields?
a. A historian traveling to Italy to study ancient manuscripts to uncover historical insights
about the Roman Empire
b. A mathematician solving complex equations to model physical phenomena
c. A biologist analyzing a large dataset of genetic sequences to gain insights about the
genetic basis of diseases
d. A chemist synthesizing new compounds in a laboratory
Solution: c. A biologist analyzing a large dataset of genetic sequences to gain insights about the
genetic basis of diseases
Traditionally, biologists would conduct lab experiments to answer questions in their field;
however, nowadays data science is being used to analyze large datasets to extract valuable
information that can shed light on complex topics such as the genetic basis of diseases. Option
a) is incorrect as studying primary sources does not inherently involve data science. Option b) is
11/11/24 For more free, peer-reviewed, openly licensed resources visit OpenStax.org. 2