Unit 3: Mining Frequent Pattern
Frequent itemset mining is a core task in data mining, especially within
the context of market basket analysis, where the goal is to find items that
frequently occur together. Efficient and scalable algorithms are crucial
due to the potentially exponential search space in large datasets.
Efficient and scalable frequent itemset mining methods
Apriori Algorithm
,The Apriori algorithm is a classic algorithm in data mining used for
frequent itemset mining and association rule learning over transactional
databases.
Purpose: To find frequent itemsets (sets of items that appear frequently
together in a dataset) and use them to generate association rules (e.g., If a
customer buys bread, they are likely to buy butter).
How It Works
The Apriori algorithm operates in two main steps:
1. Frequent Itemset Generation
Finds all itemsets that appear in at least min_support transactions.
Uses the Apriori property:
If an itemset is frequent, all of its subsets must also be frequent.
2. Association Rule Generation
From the frequent itemsets, generate rules that have a confidence
above a user-defined threshold.
Algorithm Steps
1. Scan the database to find frequent 1-itemsets.
2. Generate candidate itemsets of length k+1 from frequent itemsets of
length k.
3. Prune candidate itemsets that have infrequent subsets.
4. Count support of remaining candidates by scanning the database.
5. Repeat until no more frequent itemsets are found.
6. Generate association rules from frequent itemsets using confidence.
, Example
Transactions:
TID Items
T1 A, B, C
T2 A, B
T3 A, C
T4 B, C
T5 A, B, C
Min Support = 3 transactions
Min Confidence = 70%
Step 1: Frequent 1-itemsets
A: 4
B: 4
C: 4 → All are frequent
Step 2: Generate 2-itemsets
AB: 3
AC: 3
BC: 3 → All are frequent
Step 3: Generate 3-itemsets
ABC: 2 → Not frequent (support < 3)
Step 4: Generate Rules
From AB:
A → B: 3/4 = 75%