Apriori Algorithm

Rule generation in Apriori algorithm

In order to generate association rules, the Apriori algorithm uses a level-wise approach, where each level corresponds to the number of items that belong to the rule consequent. At first, all the high confidence rules that have only one item in the rule consequent are extracted. Then new rules are generated from these ones.

For example, if:

{a, c, d}→ {b}

{a, b, d} →{c}

are high-confidence rules, then the candidate rule {a, d} → {b, c} is generated by merging the consequents of both rules. (Association analysis: Basic concepts and rules.) In other words, candidate rule is generated by merging two rules that share the same prefix in the rule consequent.

The Apriori algorithm

General idea

The Apriori is the most commonly used algorithm for frequent item set mining. It starts with identifying the frequent individual items in the transactional database and proceeds with extending them to larger and larger itemsets until they appear often enough in the database. The algorithm is terminated when no further extensions that satisfy the minimum support condition are found.

The main idea of the algorithm is scanning the database for frequent itemsets, while on each following step pruning those items that are found to be infrequent. There are two very important steps in the candidate generation – the join and the prune step. In the first step, joining Lk with itself results in the generation of Ck+1. While in the prune step, if there is any k-itemsets that is infrequent it is pruned because it cannot be a subset of the frequent (k+1) itemset.

Ck – candidate itemsets with size k.

Lk – frequent itemsets with size k.

The Apriori algorithm can be represented in the following steps:

1. 1.Find frequent items and put the to Lk (k=1).

2. Use Lk to generate a collection of candidate itemsets Ck+1 with size (k+1).

3. Scan the database to find which items in Ck+1 are frequent and put them into Lk+1.

4. If Lk+1 is not empty:

K:=k+1
Go to step №2.

The example below shows how the Apriori works in a few simple steps. Let’s suppose that a sample database of transactions consists of the following sets: {a, c, d}, {b, c, e}, {a, b, c, e}, {b, e}. Each letter corresponds to a certain product from the assortment. For example {a} is shampoo, {b} is hair conditioner.

On the first step, the algorithm counts up the frequencies of each item separately, also called supports. If we want to be sure that an item is frequent, we can predefine the minimum support level. In this case, the minimum support is 2. Therefore, four of the items are found to be frequent. In the next step a list of all the 2-pairs of frequent items is generated. The already found infrequent items are excluded for further analysis. In order to find all possible two-item pairs, the Apriori algorithm prunes the of all possible combinations . At the last step, by connecting a frequent pair to a frequent single item a list ot all the three-triplets of frequent items is generated. The algorithm ends at this step, because the pair of four items generated at the next step doesn’t meet the required minimum support.