Naïve Bayes Classifier (Supervised Learning/Classification)

Earlier in this chapter, we looked at Bayes’ theorem. As for machine learning, this has been modified into something called the Naïve Bayes Classifier. It is “naïve” because the assumption is that the variables are independent from each other—that is, the occurrence of one variable has nothing to do with the others. True, this may seem like a drawback. But the fact is that the Naïve Bayes Classifier has proven to be quite effective and fast to develop.

There is another assumption to note as well: the a priori assumption. This says that the predictions will be wrong if the data has changed.

There are three variations on the Naïve Bayes Classifier :

Bernoulli: This is if you have binary data (true/false, yes/no).
Multinomial: This is if the data is discrete, such as the number of pages of a book.
Gaussian: This is if you are working with data that conforms to a normal distribution.

A common use case for Naïve Bayes Classifiers is text analysis. Examples include email spam detection, customer segmentation, sentiment analysis , medical diagnosis, and weather predictions. The reason is that this approach is useful in classifying data based on key features and patterns.

To see how this is done, let’s take an example: Suppose you run an e-commerce site and have a large database of customer transactions. You want to see how variables like product review ratings, discounts, and time of year impact sales.

Table 3-2 shows a look at the dataset.

Table 3-2.

Customer transactions dataset

Discount	Product Review	Purchase
Yes	High	Yes
Yes	Low	Yes
No	Low	No
No	Low	No
No	Low	No
No	High	Yes
Yes	High	No
Yes	Low	Yes
No	High	Yes
Yes	High	Yes
No	High	No
No	Low	Yes
Yes	High	Yes
Yes	Low	No

You will then organize this data into frequency tables , as shown in Tables 3-3 and 3-4.

Table 3-3.

Discount frequency table

		Purchase
		Yes	No
Discount	Yes	19	1
Discount	Yes	5	5

Table 3-4.

Product review frequency table

		Purchase
		Yes	No	Total
Product Review	High	21	2	11
Product Review	Low	3	4	8
	Total	24	6	19

When looking at this, we call the purchase an event and the discount and product reviews as independent variables . Then we can make a probability table for one of the independent variables, say the product reviews. See Table 3-5.

Table 3-5.

Product review probability table

		Purchase
		Yes	No
Product Reviews	High	9/24	2/6	11/30
	Low	7/24	1/6	8/30
		24/30	6/30

Using this chart, we can see that the probability of a purchase when there is a low product review is 7/24 or 29%. In other words, the Naïve Bayes Classifier allows more granular predictions within a dataset. It is also relatively easy to train and can work well with small datasets.

Naïve Bayes Classifier (Supervised Learning/Classification)

Comments

Leave a Reply Cancel reply