EXAMINATION 2026 QUESTIONS WITH
ANSWERS GRADED A+
◍ data imputation.
Answer: the process of replacing a null or missing value with a substituted
value
◍ Principal component analysis.
Answer: a popular technique for analyzing large datasets containing a high
number of dimensions/features per observation, increasing the
interpretability of data while preserving the maximum amount of
information, and enabling the visualization of multidimensional data.
◍ A pharmaceutical company collected data on patient outcomes for a new
drug it is testing.Which question regarding the source or quality of the
available data is most appropriate to ask before analysis?- Did the data come
from a completely unbiased source?- Was the data collected in secret,
without the knowledge of the doctors?- Was the data collected from
electronic health records (EHRs) of patients using the drug?- Can data be
excluded to decrease the impact of side effects on the analysis?.
Answer: Was the data collected from electronic health records (EHRs) of
patients using the drug?
◍ What is a primary responsibility of a data analyst?- Developing data
visualizations for stakeholders- Designing and implementing data storage
solutions- Conducting statistical analysis to identify patterns and trends-
Developing predictive models using machine learning algorithms.
Answer: Conducting statistical analysis to identify patterns and trends
◍ Which type of data is needed to assess whether a new type of web content is
, increasing user engagement?Web logDemographicCompetitor
analysisAdvertising cost.
Answer: Web log
◍ Who should be included as stakeholders in an analytics project?- Anyone
who will benefit from the project- Anyone who has relevant skills- Anyone
who is available to participate- Anyone who is a manager in the
organization.
Answer: Anyone who will benefit from the project.
◍ Which activity should the data analytics team focus on during the
communicate results phase- Presenting key findings to stakeholders and
evaluating the project's success- Building and testing different predictive
models for customer churn- Analyzing the financial impact of the project on
the company's revenue and customer retention- Performing data cleaning
and transforming raw data into usable formats.
Answer: Presenting key findings to stakeholders and evaluating the project's
success
◍ What is the primary responsibility of the business intelligence analyst during
the operationalize phase of the data analytics life cycle?- To make sure their
reports and dashboards are up-to-date- To collect data that can be used to
train the model- To label data that can be used to train the model- To gather
data that can be used to monitor the model.
Answer: To make sure their reports and dashboards are up-to-date
◍ Describe how logistic regression can be used as a classifier..
Answer: used to predict the probability of certain classes based on some
dependent variables. In short, the logistic regression model computes a sum
of the input features (in most cases, there is a bias term), and calculates the
logistic of the result.
◍ Which classification model is based on the concept of probability and
assigns class labels to instances based on the possibility of belonging to a
particular class?- Naive Bayes- Support vector machines (SVM)- Decision
tree- Random forest.
, Answer: Naive Bayes
◍ What should business users and project sponsors do with their findings
during the operationalize phase of a data analytics project?- Develop and
refine data models- Assess benefits, implications, and business impact-
Produce detailed reports and visuals- Evaluate project completion and goals.
Answer: Assess benefits, implications, and business impact
◍ What are the use cases for association rules?.
Answer: Market basket analysisRecommender systemsIdentifying web
usage patterns
◍ What do data analytics teams do in the operationalize phase of a data
analytics project?- Apply data transformations to fix problems with data and
surface information- Communicate project benefits, set up the pilot project,
and deploy in production- Explore data, create model sets, and partition
them into training, validation, and test sets- Translate business problems into
data mining problems and locate appropriate data.
Answer: Communicate project benefits, set up the pilot project, and deploy
in production
◍ Naive Bayes.
Answer: A method of supervised learning where the algorithm assumes that
the presence of a particular feature in a class is conditionally independent of
any other.
◍ What is a type I error? What is a type II error? Is one always more serious
than the other? Why?.
Answer: A type I error (false-positive) occurs if an investigator rejects a null
hypothesis that is actually true in the population; a type II error
(false-negative) occurs if the investigator fails to reject a null hypothesis that
is actually false in the population.in data science a false- negative pose
greater risks so it is more serious
◍ Which role is responsible for project initiation and providing the
requirements for a project?Business intelligence analystProject
, sponsorBusiness userData scientist.
Answer: Project Sponsor
◍ Which tools are commonly used for communicating results in data analytics
projects?- Predictive modeling software and programming languages- Data
visualization tools and presentation software- Database management
systems and data warehouses- Text editors and spreadsheet software.
Answer: Data visualization tools and presentation software
◍ linear regression model.
Answer: describes the relationship between a dependent variable, y, and one
or more independent variables, X
◍ A company will survey its customers to understand the potential demand for
a new product. A data analyst will review the data.Which question should
the analysis consider to validate the representativity of the data?- What tool
is being used to visualize the data?- What tool will be used to prepare the
data?- What tool will be used to analyze the data?- What is the response rate
for the survey?.
Answer: What is the response rate for the survey?
◍ A healthcare company wants to predict which patients are at risk of
developing a certain medical condition. Which model is commonly used for
this type of analysis?Decision treeAssociation rulesK-means
clusteringLogistic regression.
Answer: Logistic regressionLogistic regression is a model that predicts the
probability of an event occurring.
◍ A data analyst is tasked with understanding customer satisfaction data and is
emailed a file with the data.Which question should the data analyst ask
about the data regarding where it is sourced from?- Is the data backed up?-
Can the data be improved?- Has the data been copied into multiple
languages?- When was the data collected?.
Answer: When was the data collected?
◍ Which regression model is commonly used for predicting a continuous