MINING QUESTIONS AND ANSWERS.
What is a model? - Answer -A concise description of a pattern (relationship) that exists in
data
-Also referred to as a theory
- A general pattern induced from data
Classification trees - Answer - Easy to understand the relationships in the data captured by
the model
- Computationally fast to induce from data
- Constructed by recursively partitioning the examples in the data
Nodes - Answer Each "non-terminal" node represents a test on an attribute
Leaves - Answer Terminal nodes - a prediction on a classification tree
How to extract rules from a classification tree model - Answer Each path from the root of the
tree (top node) to a leaf node constitutes a rule: IF (refund = yes) & (Marital Status = Married)
THEN "NO"
Recursive Partitioning - Answer With each partition the examples are split into subgroups
that have "increasingly more pure" class distribution (used for classification trees)
Classification - Answer Class prediction
Data set - Answer A set of examples
Training Data - Answer Data used to induce (train) a model
Induction - Answer A process by which a pattern is extracted from factual data (experience)
Linear Regression - Answer Is an induction algorithm
Supervised learning - Answer - Objective is to estimate/predict an unknown value
, - Model captures a relationship between a set of independent attributes (predictors) and a
dependent attribute (target)
Unsupervised Learning - Answer All modeling tasks which are not used to predict/estimate
an unknown value (Clustering/segmentation)
Predictive Model - Answer The target/dependent variable is discrete (categorical)
Classification Model - Answer Includes a set of {IF (condition) THEN {class}) rules
Regression - Answer A predictive model that predicts the value of a numerical (real-value)
variable
Clustering/Segmentation Analysis - Answer Identifies distinct groups or cluster of "similar"
instances
Link Analysis: Association Rules - Answer Finds relations among attributes in the data that
frequently co-occur
Sequence Analysis - Answer Find patterns in time-stamped data
Subtree - Answer Branching from a node. Captures predictive patterns that fit a sub-
population
Information Gain - Answer Captures how informative the attribute is
= Impurity (parent) - weighted average (children)
Entropy - Answer Quantifies the level of impurity (or uncertainty) in a group of examples
Entropy = Sum of proportion x log2proportion
(High entropy = bad, 0 entropy = 100% predictability)
Classification Accuracy Rate - Answer Proportion of examples whose class is predicted
accurately by the model