Earlier in this chapter, we looked at Bayes’ theorem. As for machine learning, this has been modified into something called the Naïve Bayes Classifier. It is “naïve” because the assumption is that the variables are independent from each other—that is, the occurrence of one variable has nothing to do with the others. True, this may seem like a drawback. But the fact is that the Naïve Bayes Classifier has proven to be quite effective and fast to develop.
There is another assumption to note as well: the a priori assumption. This says that the predictions will be wrong if the data has changed.
There are three variations on the Naïve Bayes Classifier :
- Bernoulli: This is if you have binary data (true/false, yes/no).
- Multinomial: This is if the data is discrete, such as the number of pages of a book.
- Gaussian: This is if you are working with data that conforms to a normal distribution.
A common use case for Naïve Bayes Classifiers is text analysis. Examples include email spam detection, customer segmentation, sentiment analysis , medical diagnosis, and weather predictions. The reason is that this approach is useful in classifying data based on key features and patterns.
To see how this is done, let’s take an example: Suppose you run an e-commerce site and have a large database of customer transactions. You want to see how variables like product review ratings, discounts, and time of year impact sales.
Table 3-2 shows a look at the dataset.
Table 3-2.
Customer transactions dataset
| Discount | Product Review | Purchase |
|---|---|---|
| Yes | High | Yes |
| Yes | Low | Yes |
| No | Low | No |
| No | Low | No |
| No | Low | No |
| No | High | Yes |
| Yes | High | No |
| Yes | Low | Yes |
| No | High | Yes |
| Yes | High | Yes |
| No | High | No |
| No | Low | Yes |
| Yes | High | Yes |
| Yes | Low | No |
You will then organize this data into frequency tables , as shown in Tables 3-3 and 3-4.
Table 3-3.
Discount frequency table
| Purchase | |||
|---|---|---|---|
| Yes | No | ||
| Discount | Yes | 19 | 1 |
| Yes | 5 | 5 | |
Table 3-4.
Product review frequency table
| Purchase | ||||
|---|---|---|---|---|
| Yes | No | Total | ||
| Product Review | High | 21 | 2 | 11 |
| Low | 3 | 4 | 8 | |
| Total | 24 | 6 | 19 | |
When looking at this, we call the purchase an event and the discount and product reviews as independent variables . Then we can make a probability table for one of the independent variables, say the product reviews. See Table 3-5.
Table 3-5.
Product review probability table
| Purchase | ||||
|---|---|---|---|---|
| Yes | No | |||
| Product Reviews | High | 9/24 | 2/6 | 11/30 |
| Low | 7/24 | 1/6 | 8/30 | |
| 24/30 | 6/30 | |||
Using this chart, we can see that the probability of a purchase when there is a low product review is 7/24 or 29%. In other words, the Naïve Bayes Classifier allows more granular predictions within a dataset. It is also relatively easy to train and can work well with small datasets.

Leave a Reply