Introduction to Data Mining

<<< Previous    Up    Next >>>

Lesson 6.6.1

APRIORI's Frequent/Candidate generation

 

Frequent itemset generation

    Scan D and count each itemset in Ck, if the count is greater than minSupp, then add that         itemset to Lk.

 

Candidate itemset generation

    For k = 1, C1 = all itemsets of length = 1.

    For k > 1, generate Ck from Lk-1 as follows:

bullet

The join step:

bullet

Ck = k-2 way join of Lk-1 with itself.

bullet

If both {a1,..,ak-2, ak-1} & {a1,.., ak-2, ak} are in Lk-1, then add {a1,..,ak-2, ak-1, ak} to Ck.

bullet

The items are always stored in the sorted order.

bullet

The prune step:

bullet

Remove {a1, …,ak-2, ak-1, ak}, if it contains a non-frequent (k-1) subset.

 

An Example

TID  Itemsets

      1.    scan D → C1 = 1:2, 2:3, 3:3, 4:1, 5:3.

                    → L1 = 1:2, 2:3, 3:3,     , 5:3.

                    → C2 = 12, 13, 15, 23, 25, 35.

      2.    scan D → C2 = 12:1, 13:2, 15:1, 23:2, 25:3, 35:2.

                    → L2 =       , 13:2,      , 23:2, 25:3, 35:2.

                    → C3 = 235

                    → Pruned C3= 235

      3.    scan D → C3 = 235:2             

                    → L3 = 235:2

 T100  1 3 4
 T200  2 3 5
 T300  1 2 3 5
 T400  2 5
   

minSupp = 0.5

 

An Example showing why order of items should be maintained

TID  Itemsets

      1.    scan D → C1 = 1:2, 2:3, 3:3, 4:1, 5:3.

                    → L1 = 1:2, 2:3, 3:3,     , 5:3.

                    → C2 = 12, 13, 15, 23, 25, 35.

      2.    scan D → C2 = 12:1, 13:2, 15:1, 23:2, 25:3, 35:2.

      Suppose the order of items is now decided as: 5,4,3,2,1.

                    → L2 =       , 31:2,      , 32:2, 52:3, 53:2.

                    → C3 = 321, 532.

                    → Pruned C3=      532.

      3.    scan D → C3 = 532:2             

                    → L3 = 532:2

 T100  1 3 4
 T200  2 3 5
 T300  1 2 3 5
 T400  2 5
   

minSupp = 0.5

 

<<< Previous    Up    Next >>>