Data Science Exam 2 Question and
answers rated A+ 2025/2026
Data collection
process of gathering raw data from various sources
Observational study
collects information by asking people questions, observing behaviors, etc. The
researcher does not change anything, they only record responses
Experiment
tests cause and effects by changing one variable to see what happens
What are the 5 principles of data collection?
transparency, privacy, consent, accuracy, and accountability
Anonymization
removing identifying information
De-identification
masking personal data
Aggregation
grouping data to prevent identification
Sampling bias
certain groups overrepresented
Nonresponse bias
some groups less likely to respond
Measurement bias
, questions framed in misleading ways
Historical bias
data reflects existing inequalities
Structured data
organized in rows and columns
Unstructured data
not neatly tabular
Primary data
collected directly for your purpose
Secondary data
collected by someone else
Surveys
collect self-reported data
Random sampling
every individual has equal probability
Convenience sampling
easy-to-access individuals are selected
Stratified sampling
population divided into groups; sample drawn from each
Cluster sampling
population is divided into natural groups, a whole cluster/group is selected
Systematic sampling
every 4th person is selected for participation
answers rated A+ 2025/2026
Data collection
process of gathering raw data from various sources
Observational study
collects information by asking people questions, observing behaviors, etc. The
researcher does not change anything, they only record responses
Experiment
tests cause and effects by changing one variable to see what happens
What are the 5 principles of data collection?
transparency, privacy, consent, accuracy, and accountability
Anonymization
removing identifying information
De-identification
masking personal data
Aggregation
grouping data to prevent identification
Sampling bias
certain groups overrepresented
Nonresponse bias
some groups less likely to respond
Measurement bias
, questions framed in misleading ways
Historical bias
data reflects existing inequalities
Structured data
organized in rows and columns
Unstructured data
not neatly tabular
Primary data
collected directly for your purpose
Secondary data
collected by someone else
Surveys
collect self-reported data
Random sampling
every individual has equal probability
Convenience sampling
easy-to-access individuals are selected
Stratified sampling
population divided into groups; sample drawn from each
Cluster sampling
population is divided into natural groups, a whole cluster/group is selected
Systematic sampling
every 4th person is selected for participation