All Correct Answers 2025-2026
Updated.
What is data mining? - Answer - Finding or extracting patterns in data
- Concerned with meaningful previously unknown patterns
- Combines statistics, machine learning and computing
- It is motivated by:
i. Large volumes of data
ii. Different type/dimensions of data
iii. Complex questions that require more than traditional statistical analyses
What is Descriptive Modelling? - Answer -Focus on historical data only and summarises what
has happened.
-Developing reports, dashboards, and scorecards are the main objectives/outcome of
descriptive modelling. (e.g. What was the best-selling product of company X in the years 2016
and 2017? / what were the airlines with the highest passenger satisfaction rates in 2015-17?)
What are the descriptive modelling features? - Answer -Extracts or presents the main
descriptive features from data
-Summarises data w.r.t. specific data dimensions
(e.g. time, manufacturer, product, event, demographics, interests and the similar)
-Find co-occurrences of events or patterns
-Find associations in data elements
-Make no assumptions about data prior to modelling.
CRISP framework - Answer -Stands for Cross Industry Standard Process for data mining.
-Most widely adopted framework for data mining.
What are some Descriptive Techniques? - Answer -Correlation analysis
-Data clustering (or segmentation)
What are Predictive Analytics Features? - Answer -Models existing historic data to be able to
predict likely future outcomes or events given similar future unseen data, for this predictive
analysis creates a model that represents how different variables in data are related to each
other.
-Finds the most relevant part of data that can be used for prediction.
,What is Predictive Analysis? - Answer Focus on future outcomes for the business given their
historic data. It will help understand what is likely to occur in the future (e.g. what is the likely
student retention rate in course X in my university next year?)
What are some Predictive Analytics Features? - Answer -Models existing historic data to be
able to predict likely future outcomes or events given similar future unseen data,
for this predictive analysis creates a model that represents how different variables in data are
related to each other.
-Finds the most relevant part of data that can be used for prediction
-Can predict a single outcome or a series of outcomes over time, the latter is referred to as
forecasting.
-Needs exiting historic data to be labelled
(requires assumptions on data prior to analysis).
Data preparation - Answer data quality:
-ACCURACY - dealing with data errors or extreme cases that deviate from expectation
-COMPLETENESS- dealing with the lack of attribute values, lack of certain attributes, or presence
of aggregates values only
-CONSISTENCY- dealing with discrepancies in the data of processes that generate the data
-TIMELINESS- dealing with the timeframes within which data are prepared
-BELIEVABILITY- are data according to the standard process and can they be relied upon?
-INTERPRETABILITY- can the data be interpreted or understood?
-UNIFORMITY- dealing with the unit of measurement for the data and making sure all data
points are in the same unit.
Data Terminology - Answer Data set: collection of data with some defined structure
Data Point: a single instance in the data set
Attribute: it is a single property of each data instance/point
Label: it is the special attribute that needs to be predicted based on input attributes.
, Identifier: it is a special attribute that is used for providing context to each data point. They are
excluded from actual data mining steps
Training set: it is a portion of the data that is used for model building and tuning purposes only
Test set: it is the remaining portion of the data set that I used for model evaluation only.
*the training set and test set must not overlap or have common data points
Preparations steps - Answer Data cleaning
Data integration
Data transformation
Data reduction
Data Cleaning - Answer It is the process of removing any inaccurate or incomplete data from
a data set.
Done via:
-replacing data errors or missing values
-modifying data errors
-deleting data errors or missing values.
Data cleaning (missing value treatment) - Answer ignore or delete data points with missing
values
- Ignore or delete attributes with missing values
- Replace missing values with a constant value (e.g. 0 for numeric attributes)
- Replace missing values with the central tendency of the attribute (Mean: numeric/ Mode:
nominal)
What are some predictive analytics techniques? - Answer -Classification
-Numeric estimation/prediction (i.e. regression)
-Time series analysis (i.e. forecasting)
-Anomaly detection
Main steps of CRISP Framework - Answer -Business Understanding
-Data understanding
-Data preparation
-modelling
-evaluation
-deployment