(Study Cards) Actual Exam Complete
Questions and Correct Answers| 2026/27
Updated
Module 1: Foundational Concepts & Data Preprocessing
1. Which of the following best describes the purpose of data mining?
A) To store large amounts of data
B) To extract patterns and knowledge from large datasets
C) To visualize data in charts and graphs
D) To secure data from unauthorized access
Correct ✔✔✔ANSW✔✔: B
Rationale: Data mining is the process of discovering interesting, previously
unknown, and potentially useful patterns from large volumes of data. It goes
beyond simple querying or reporting to extract hidden insights.
2. The Knowledge Discovery in Databases (KDD) process begins with which
step?
A) Data Mining
B) Data Transformation
C) Data Cleaning
D) Pattern Evaluation
,Correct ✔✔✔ANSW✔✔: C
Rationale: The KDD process starts with data cleaning (or preprocessing) to
handle noise, missing values, and inconsistencies. This is essential because real-
world data is often messy, and the quality of the input directly affects the quality of
the mined patterns.
3. Which of the following is NOT a major challenge in data mining?
A) Scalability and efficiency of algorithms
B) Handling diverse data types
C) Guaranteeing 100% prediction accuracy
D) User interaction and background knowledge integration
Correct ✔✔✔ANSW✔✔: C
*Rationale: Data mining aims to find statistically significant patterns, but it cannot
guarantee perfect predictions. Real-world data is inherently noisy and complex,
making 100% accuracy an unrealistic expectation. The main challenges are
scalability, diverse data, and incorporating user knowledge.*
4. Data cleaning primarily involves:
A) Removing noisy data and correcting inconsistencies
B) Transforming data into a suitable format for mining
C) Selecting relevant subsets of data
D) Integrating data from multiple sources
Correct ✔✔✔ANSW✔✔: A
Rationale: Data cleaning (or scrubbing) focuses on improving data quality by
handling missing values, smoothing noisy data, identifying outliers, and resolving
inconsistencies. It is a critical first step.
5. Which type of learning does NOT have a predefined target variable?
A) Supervised learning
B) Unsupervised learning
,C) Reinforcement learning
D) Active learning
Correct ✔✔✔ANSW✔✔: B
Rationale: Unsupervised learning works with unlabeled data, meaning there is no
predefined target class or outcome. The goal is to discover hidden structures,
groupings (clustering), or patterns within the data itself.
6. Which of the following is a standard application of data mining?
A) Fraud detection in financial transactions
B) Customer relationship management (CRM)
C) Market basket analysis
D) All of the above
Correct ✔✔✔ANSW✔✔: D
Rationale: Data mining has a wide range of practical applications. Fraud
detection uses classification to identify suspicious activities. CRM uses clustering
to segment customers. Market basket analysis uses association rules to find
product relationships.
7. The primary goal of data transformation is to:
A) Remove duplicate records
B) Convert data into forms appropriate for mining
C) Combine data from multiple sources
D) Visualize the final patterns
Correct ✔✔✔ANSW✔✔: B
Rationale: Data transformation involves operations like normalization (scaling
data to a specific range), discretization (converting continuous data into intervals),
and attribute construction to make the data suitable for specific data mining
algorithms.
, 8. Which term describes this situation? Classes can be separated by a linear
decision surface.
A) Linear separability
B) Kernel trick
C) Non-linear mapping
D) Overfitting
Correct ✔✔✔ANSW✔✔: A
Rationale: Linear separability means that a dataset's classes can be perfectly
divided by a straight line (in 2D) or a hyperplane (in higher dimensions). This is a
key concept for algorithms like the linear Support Vector Machine (SVM).
9. Which of the following is a type of supervised learning task?
A) Clustering customers by purchasing habits
B) Predicting stock prices based on historical data
C) Finding association rules in a grocery dataset
D) Reducing the dimensionality of a dataset
Correct ✔✔✔ANSW✔✔: B
Rationale: Predicting a continuous value (stock price) based on labeled historical
data is a classic supervised learning task known as regression. Classification is
another type of supervised learning, while clustering and association are
unsupervised.
10. What is the primary purpose of the CRISP-DM methodology?
A) To secure data mining models
B) To provide a standardized process for data mining projects
C) To visualize the results of data mining
D) To sample data for analysis
Correct ✔✔✔ANSW✔✔: B
Rationale: CRoss Industry Standard Process for Data Mining (CRISP-DM) is a