top of page

ASSOCIATION RULES

MINING

Intoduction

 

Market basket analysis looks at purchase coincidence. It investigates whether two products are being purchased together, and whether the purchase of one product increases the likelihood of purchasing the other.

 

Many business enterprises accumulate large quantities of data from their day-to-day operations. For example, huge amounts of customer purchase data are collected daily at the checkout counters of grocery stores. The table given below illustrates an example of such data, commonly known as market basket transaction.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Each row in the table corresponds to a transaction, which contains a unique identifier labelled TID (Transaction ID) and a set of items bought by a given customer. Retails are interested in analysing the data to learn about the purchasing behaviour of their customers. Such valuable information can be used to support a variety of business related applications such as marketing promotions, inventory management and customer relationship management.

 

Here we present a methodology known as association analysis, which is useful for discovering interesting relationships hidden in large data sets. The uncover relationships can be represented in the form of association rules or set of frequent items. For example, the following rule can be extracted from the data set shown in Table 1:

 

                                                         {Diapers} => {Beer}

T

 

he rule suggests that a strong relationship exists between the sale of diapers and beer because many customers who buy diapers also buy beer. Retailers can use this type of rules to help them identify new opportunities for cross selling their products to the customers.

 

 

 

Market Basket Analysis: How does it help?

 

Suppose, as branch manager of an Electronics company, you would like to learn more about the buying habits of your customers. Specifically, you wonder, “Which groups or sets of items are customers likely to purchase on a given trip to the store?” To answer your question, market basket analysis may be performed on the retail data of customer transactions at your store. You can then use the results to plan marketing or advertising strategies, or in the design of a new catalogue. For instance, market basket analysis may help you design different store layouts. In one strategy, items that are frequently purchased together can be placed in proximity to further encourage the combined sale of such items. If customers who purchase computers also tend to buy antivirus software at the same time, then placing the hardware display close to the software display may help increase the sales of both items.

 

In an alternative strategy, placing hardware and software at opposite ends of the store may entice customers who purchase such items to pick up other items along the way. For instance, after deciding on an expensive computer, a customer may observe security systems for sale while heading toward the software display to purchase antivirus software, and may decide to purchase a home security system as well. Market basket analysis can also help retailers plan which items to put on sale at reduced prices. If customers tend to purchase computers and printers together, then having a sale on printers may encourage the sale of printers as well as computers.

 

 

 

Representation of the patterns

 

The patterns, as shown in table 2 can be represented in the form of association rules. For example, considering the example of the electronic store the information that customers who purchase computers also tend to buy antivirus software at the same time is represented in the following association rule:

 

Rule support and confidence are two measures of rule interestingness. They respectively reflect the usefulness and certainty of discovered rules.

 

  • A support of 2% for the above Rule means that 2% of all the transactions under analysis show that computer and antivirus software are purchased together.

 

  • A confidence of 60% means that 60% of the customers who purchased a computer also bought the software.

 

Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold. These thresholds can be a set by users or domain experts. Additional analysis can be performed to discover interesting statistical correlations between associated items.

 

 

From Association Analysis to Correlation Analysis

 

As we have seen so far, the support and confidence measures are insufficient at filtering out uninteresting association rules. To tackle this weakness, a correlation measure can be used to augment the support–confidence framework for association rules. This leads to correlation rules of the form

 

                                    A ⟹ B [support, confidence, correlation].

 

That is, a correlation rule is measured not only by its support and confidence but also by the correlation between itemsets A and B. There are many different correlation measures from which to choose. In this subsection, we study several correlation measures to determine which would be good for mining large data sets. Lift is a simple correlation measure that is given as follows. The occurrence of itemset A is independent of the occurrence of itemset B if

 

 

                                                 P(A and B) = P(A).P(B)

 

; Otherwise, itemsets A and B are dependent and correlated as events. This definition can easily be extended to more than two itemsets. The lift between the occurrence of A and B can be measured by computing,

 

 

 

 

 

 

 

 

If the resulting value of lift is less than 1, then the occurrence of A is negatively correlated with the occurrence of B, meaning that the occurrence of one likely leads to the absence of the other one. If the resulting value is greater than 1, then A and B are positively correlated, meaning that the occurrence of one implies the occurrence of the other. If the resulting value is equal to 1, then A and B are independent and there is no correlation between them.

 

Taken from book of Johannes Ledolter. Data Mining & Business Analytics with R

bottom of page