Searching and Matching for candidate itemsets can be speeded by using hash counts to filter candidates. When counting candidate k-1 itemsets, get counts for the "hash-groups" of k-itemsets instead of getting counts for all k-itemsets. The procedure is explained here.
Use hash function, h on the k-itemsets.
For each transaction t, if a k-itemset s is a subset of t, then add one to the count of h(s).
Remove all candidates, q (generated by Apriori) such that h(q)'s count is < minSupp.
The idea is quite useful for candidates of size k = 2, but often not so useful elsewhere. For sparse datasets, the step for k=2, can be the most expensive for Apriori.
We have transactional dataset, D = [134, 235, 1235, 25].
Suppose hash function, h is: h(x,y) = ((order of x)*10 + (order of y)) mod 7.
where order of 1 is 1, order of 2 is 2, and so on.
E.g. h(1,4) = 0, h(1,5) = 1, h(2,4) = 3, ........
bucket0 bucket1 bucket2 bucket3 bucket4 bucket5 bucket6
14 15 23 24 25 12 13
35 34
counts 3 1 2 0 3 1 3
Then 2-itemsets hashed to buckets 1 and 5 cannot be frequent (e.g. 15, 12), so remove them from C2.
![]()