Practitioner (CAIP) exam questions with
verified correct answers latest update
2025
Why might big data actually be detrimental to the machine learning
process? (Select two.)
Big datasets can have a negative impact on predictive performance.
Big datasets can be difficult for machine learning algorithms to
process.
Big datasets are difficult to obtain, which may result in lost time and
resources to acquire them.
Big datasets can have a negative impact on computing performance. --
- correct answer ---Big datasets can be difficult for machine learning
algorithms to process.
Correct
,Given that big data can vary wildly in terms of structure and content,
a machine learning algorithm may have a difficult time processing
such unorganized data.
Big datasets can have a negative impact on computing performance.
Correct
In most cases, the more data you use in the training, the more
computing power and time you'll need.
Big datasets are difficult to obtain, which may result in lost time and
resources to acquire them.
This should not be selected
This was true at one point, but now, acquiring incredibly large
amounts of data is commonplace and not necessarily cost prohibitive.
You've managed to find a spreadsheet of customer purchasing history
for the business over a period of a few years. You plan to feed this
,data into your supervised machine learning model in order to predict
what types of products will generate the most gross income in the
future.
The spreadsheet has rows for each purchase and columns for the ID of
the customer who made the purchase; what category of product they
purchased; the quantity of the purchase; the total price of the
purchase; and the revenue generated from the purchase. Luckily, none
of the cells have missing values.
Considering the objective of the machine learning model, what crucial
type of data is missing from this spreadsheet?
point
Example
Label
Attribute
Feature --- correct answer ---Label - The model is supposed to predict
gross income, which means gross income is the label, but no such
column seems to exist.
Since each column is defined, you have your attributes.
, Leptokurtic distribution --- correct answer ---Distribution curve is
very tall, thin and peaked.
(Memory: Leptokurtic leaps tall buildings in a single bound.)
Kurtosis is greater than 3
Why is standard deviation preferred over variance for explaining or
reporting purposes?
Standard deviation ensures that a descriptive measure like mean is in
the same scale as the data itself.
Standard deviation produces numbers that are small and easier to
read.
Standard deviation is able to demonstrate variability independent of
the number of samples in the population.
Standard deviation can be used to produce charts and other
visualizations. --- correct answer ---Standard deviation ensures that a
descriptive measure like mean is in the same scale as the data itself.