Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

algorithm
apriori

What is the Apriori algorithm?

Educative Answers Team

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

The Apriori algorithm is used for mining frequent itemsets and devising association rules from a transactional database. The parameters “support” and “confidence” are used. Support refers to items’ frequency of occurrence; confidence is a conditional probability.

Items in a transaction form an item set. The algorithm begins by identifying frequent, individual items (items with a frequency greater than or equal to the given support) in the database and continues to extend them to larger, frequent itemsets​.

The apriori algorithm uses the downward closure property ,​i.e., all the subsets of a frequent itemset are frequent, but the converse may not be true.

Algorithm

The following are the main steps of the algorithm:

  1. Calculate the support of item sets (of size k = 1) in the transactional database (note that support is the frequency of occurrence of an itemset). This is called generating the candidate set.

  2. Prune the candidate set by eliminating items with a support less than the given threshold.

  3. Join the frequent itemsets to form sets of size k + 1, and repeat the above sets until no more itemsets can be formed. This will happen when the set(s) formed have a support less than​ the given support.

Let’s go over an example to see the algorithm in action. Suppose that the given support is 3 and the required confidence is 80%.

1 of 7

Now let’s create the association rules. This is where the given confidence is required. For rule X>YX -> Y, the confidence is calculated as Support(XandY)/Support(X)Support(X and Y)/Support(X).

The following rules can be obtained from the size of​ two frequent itemsets (2-frequent itemsets):​

  1. I2>I3I2 -> I3 Confidence = 3/3 = 100%.
  2. I3>I2I3 -> I2 Confidence = 3/4 = 75%
  3. I3>I4I3 -> I4 Confidence = 3/4 = 75%.
  4. I4>I3I4 -> I3 Confidence = 3/3 = 100%

Since our required confidence is 80%, only rules 1 and 4 are included in the result. Therefore, it can be concluded that customers who bought item two (I2) always bought item three (I3) with it, and customers who bought item four (I4) always bought item 3 (I3) with it.

widget

RELATED TAGS

algorithm
apriori

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring