AND CORRECT ANSWERS
Descriptive Analysis - CORRECT ANSWER Question:
*"What has happened" *
Techniques:
*Data Mining*
Tools:
*JMP PRO*
Application:
-Summarize financial performances
-Summarize purchasing behavior
Cluster Analysis is.. - CORRECT ANSWER Used to discover *natural grouping* of objects
-Objects within a group is similar
-Objects across groups is dissimilar
Cluster Analysis problem definition: - CORRECT ANSWER Giving data on objects of interest
->Find the # of groups and group memberships
Organize objects into groups
-Maximize similarities of objects within a group
-Minimize similarities of objects between groups
Cluster Analysis similarity measures... - CORRECT ANSWER Euclidean distance for
numerical data
-Straight line
-Standardization
Matching coefficient for categorical data
"Number of columns within matching values/ total number of columns"
, Cluster Analysis Business Applications: - CORRECT ANSWER 1. Market segmentation-
*target* marketing
2. Personalization- *financial* advising
3. Quality control- *outlier detection*
Cluster Analysis methods evaluation: - CORRECT ANSWER Cubic Clustering Criterion
(CCC)
-CCC>2: *good fit*
-0<=CCC <= 2: *possible fit*
-CCC~-30: *presence of outliers*
Association Rule is: - CORRECT ANSWER used to discover *nontrivial "what goes with
what" connections* among groups of items in distinct events or transactions.
Association Rule problem definition - CORRECT ANSWER Given data on the co-occurrences
of items of interest
-> Find the likelihood of the co-occurrences of these items
Express as IF-THEN statements:
-IF item-set: *condition*
-THEN item-set: *Consequent*
Association Rile evaluation - CORRECT ANSWER -Support: Proportion of occurrence
-A high value -> high frequency of occurrence
-Support (X->Y)= (# of events including both X and Y)/(Total # of events)
Cont'd - CORRECT ANSWER Confidence- accuracy
-A high value-> THEN item-set occurs frequently with the IF item-set
-Confidence (X->Y)= (#of events including X and Y) / (# of events including only X)
Cont'd - CORRECT ANSWER Lift-strength over random guesses
-*Greater than 1 -> better than guessing*