Question 1: Which of the following best describes the primary goal of data analytics?
A) Collecting as much data as possible
B) Converting raw data into actionable insights
C) Storing data in a centralized repository
D) Encrypting data for security
Answer: B
Explanation: The primary goal of data analytics is to transform raw data into meaningful insights
that can inform decision-making.
Question 2: In statistical analysis, what is the purpose of a p-value?
A) To measure the strength of a correlation
B) To determine the likelihood that an observed result occurred by chance
C) To calculate the mean of a dataset
D) To estimate the sample size needed
Answer: B
Explanation: The p-value helps determine if the observed results are statistically significant or
due to random chance.
Question 3: What is a common visualization technique for showing the distribution of a
continuous variable?
A) Bar chart
B) Pie chart
C) Histogram
D) Line graph
Answer: C
Explanation: Histograms are widely used to display the distribution of continuous variables,
illustrating frequency across intervals.
Question 4: Which statistical measure is best used to identify the central tendency of a
skewed dataset?
A) Mean
B) Mode
C) Median
D) Range
Answer: C
Explanation: The median is less affected by extreme values, making it more representative for
skewed distributions.
Question 5: When comparing two groups, which test is commonly used to assess if their
means differ significantly?
A) Chi-square test
B) T-test
,C) ANOVA
D) Regression analysis
Answer: B
Explanation: A t-test is typically used to compare the means of two groups to see if the
differences are statistically significant.
Question 6: In data analytics, what does ETL stand for?
A) Extract, Transform, Load
B) Evaluate, Test, Launch
C) Examine, Translate, Link
D) Explore, Transfer, Learn
Answer: A
Explanation: ETL stands for Extract, Transform, Load, which is the process used to prepare data
for analysis.
Question 7: Which of the following is an example of a descriptive statistic?
A) Predictive modeling
B) Regression analysis
C) Standard deviation
D) Hypothesis testing
Answer: C
Explanation: Standard deviation is a descriptive statistic that summarizes the variability in a
dataset.
Question 8: What is the main benefit of using a scatter plot in data visualization?
A) To show proportions among categories
B) To compare parts to a whole
C) To identify relationships between two quantitative variables
D) To display trends over time
Answer: C
Explanation: Scatter plots are ideal for visualizing the relationship or correlation between two
continuous variables.
Question 9: Which term refers to the process of identifying and removing errors in data?
A) Data mining
B) Data integration
C) Data cleaning
D) Data warehousing
Answer: C
Explanation: Data cleaning involves detecting and correcting errors or inconsistencies to
improve data quality.
Question 10: What does the term “big data” primarily refer to?
A) Data that is stored in large databases
B) Extremely large datasets that require advanced tools and techniques to process
C) Data generated exclusively by large enterprises
,D) Data that is used in financial industries only
Answer: B
Explanation: “Big data” refers to datasets of such high volume, velocity, and variety that
traditional processing methods are inadequate.
Question 11: In analytics, what is the role of exploratory data analysis (EDA)?
A) To confirm a predefined hypothesis
B) To provide a summary and visualization of data characteristics
C) To implement predictive models
D) To secure data storage
Answer: B
Explanation: EDA is used to summarize the main features of a dataset, often with visual
methods, to uncover patterns and anomalies.
Question 12: Which of the following best describes a time series analysis?
A) Analysis of data collected at a single point in time
B) Analysis of categorical data
C) Analysis of data points collected or recorded at successive times
D) Analysis solely based on correlation coefficients
Answer: C
Explanation: Time series analysis involves analyzing data points collected or recorded at
successive time intervals to identify trends.
Question 13: What does the term “data wrangling” refer to?
A) The physical storage of data
B) The process of cleaning and unifying complex data sets
C) The encryption of data for security
D) The visualization of data using graphs
Answer: B
Explanation: Data wrangling is the process of cleaning, restructuring, and enriching raw data into
a desired format for better decision-making.
Question 14: Which concept is crucial for ensuring the reliability and validity of statistical
analysis?
A) Data duplication
B) Data normalization
C) Data integrity
D) Data obfuscation
Answer: C
Explanation: Data integrity ensures that the data is accurate, consistent, and reliable for analysis.
Question 15: In statistical analysis, what is the purpose of using confidence intervals?
A) To measure the variability of data
B) To define the range in which the true population parameter lies
C) To identify outliers in the dataset
D) To calculate the standard error
, Answer: B
Explanation: Confidence intervals provide a range of values within which the true population
parameter is expected to lie, with a certain level of confidence.
Question 16: Which of the following is an example of inferential statistics?
A) Bar charts
B) Mean calculation
C) Hypothesis testing
D) Data cleaning
Answer: C
Explanation: Inferential statistics involves making predictions or inferences about a population
based on a sample, with hypothesis testing being a common method.
Question 17: What is the primary advantage of using a box plot for data visualization?
A) It shows the frequency distribution of a variable
B) It provides a quick summary of the distribution and identifies outliers
C) It compares multiple categorical variables
D) It displays data trends over time
Answer: B
Explanation: A box plot provides a visual summary of the distribution of a dataset, highlighting
the median, quartiles, and potential outliers.
Question 18: In predictive analytics, what is overfitting?
A) A model that is too simple and underestimates patterns
B) A model that performs well on training data but poorly on new data
C) A model that uses too few variables
D) A model that generalizes perfectly to unseen data
Answer: B
Explanation: Overfitting occurs when a model captures noise in the training data and fails to
generalize to new, unseen data.
Question 19: Which of the following is a key step in the data analytics lifecycle?
A) Data encryption
B) Data modeling
C) Data disposal
D) Data replication
Answer: B
Explanation: Data modeling is a crucial step in the data analytics lifecycle that involves
designing the data structure to support analysis.
Question 20: Which concept refers to the degree of randomness in a dataset?
A) Variance
B) Bias
C) Noise
D) Skewness
Answer: C