**Post: #1**

APRIORI Algorithm

APRIORI Algorithm.pdf (Size: 173.88 KB / Downloads: 234)

The Apriori Algorithm: Basics

The Apriori Algorithm is an influential algorithm for

mining frequent itemsets for boolean association rules.

Key Concepts :

• Frequent Itemsets: The sets of item which has minimum

support (denoted by Li for ith-Itemset).

• Apriori Property: Any subset of frequent itemset must be

frequent.

• Join Operation: To find L

k , a set of candidate k-itemsets

is generated by joining Lk-1 with itself.

The Apriori Algorithm in a

Nutshell

• Find the frequent itemsets: the sets of items that have

minimum support

– A subset of a frequent itemset must also be a

frequent itemset

• i.e., if {AB} is a frequent itemset, both {

A} and {

B}

should be a frequent itemset

– Iteratively find frequent itemsets with cardinality

from 1 to k (k-itemset)

• Use the frequent itemsets to generate association rules.

Generating 2-itemset Frequent Pattern

• To discover the set of frequent 2-itemsets, L2 , the

algorithm uses L1 Join L1 to generate a candidate set of

2-itemsets, C2.

• Next, the transactions in D are scanned and the support

count for each candidate itemset in C2 is accumulated

(as shown in the middle table).

• The set of frequent 2-itemsets, L2 , is then determined,

consisting of those candidate 2-itemsets in C2 having

minimum support.

• Note: We haven’t used Apriori Property yet.Step 3: Generating 3-itemset Frequent Pattern

Itemset

{I1, I2, I3}

{I1, I2, I5}

Itemset Sup.

Count

{I1, I2, I3} 2

{I1, I2, I5} 2

Itemset Sup

Count

{I1, I2, I3} 2

{I1, I2, I5} 2

C3 C3 L3

Scan D for

count of

each

candidate

C

Generating 3-itemset Frequent Pattern[/b]

• Based on the Apriori property that all subsets of a frequent itemset must

also be frequent, we can determine that four latter candidates cannot

possibly be frequent. How ?

• For example , lets take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1,

I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L2, We

will keep {I1, I2, I3} in C3.

• Lets take another example of {I2, I3, I5} which shows how the pruning is

performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}.

• BUT, {I3, I5} is not a member of L2 and hence it is not frequent violating

Apriori Property. Thus We will have to remove {I2, I3, I5} from C3.

• Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of

result of Join operation for Pruning.

• Now, the transactions in D are scanned in order to determine L3, consisting

of those candidates 3-itemsets in C3 having minimum support.

Methods to Improve Apriori’s Efficiency

• Hash-based itemset counting: A

k-itemset whose corresponding

hashing bucket count is below the threshold cannot be frequent.

• Transaction reduction: A transaction that does not contain any

frequent k-itemset is useless in subsequent scans.

• Partitioning: Any itemset that is potentially frequent in DB must be

frequent in at least one of the partitions of DB.

• Sampling: mining on a subset of given data, lower support threshold

+ a method to determine the completeness.

• Dynamic itemset counting: add new candidate itemsets only when

all of their subsets are estimated to be frequent.

Why Frequent Pattern Growth Fast ?

• Performance study shows

– FP-growth is an order of magnitude faster than Apriori,

and is also faster than tree-projection

• Reasoning

– No candidate generation, no candidate test

– Use compact data structure

– Eliminate repeated database scan

– Basic operation is counting and FP-tree building