All you need to know about Naive Bayes

Shubhang Agrawal
Analytics Vidhya
Published in
9 min readJan 22, 2021

--

In this blog I will be writing about simple, yet effective and commonly used, machine learning classifier, that is, Naive Bayes.

Here I will explain about What is Naive Bayes, Bayes theorem and its uses, Mathematical working of Naive Bayes, Step by Step implementation on it, Applications of Naive Bayes. Also i will provide link to my jupter notebook for references.

So without any further due lets get started.

What is Naive Bayes?

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

Bayes Theorem —

In statistics and probability theory, the Bayes’ theorem (also known as the Bayes’ rule) is a mathematical formula used to determine the conditional probability of events. Essentially, the Bayes’ theorem describes the probability of an event based on prior knowledge of the conditions that might be relevant to the event.

Formula for Bayes’ Theorem

The Bayes’ theorem is expressed in the following formula:

Where:

  • P(A|B) — the probability of event A occurring, given event B has occurred
  • P(B|A) — the probability of event B occurring, given event A has occurred
  • P(A) — the probability of event A
  • P(B) — the probability of event B

Example of Bayes’ Theorem

Imagine you are a financial analyst at an investment bank. According to your research of publicly-traded companies, 60% of the companies that increased their share price by more than 5% in the last three years replaced their CEOs during the period.

At the same time, only 35% of the companies that did not increase their share price by more than 5% in the same period replaced their CEOs. Knowing that the probability that the stock prices grow by more than 5% is 4%, find the probability that the shares of a company that fires its CEO will increase by more than 5%.

Before finding the probabilities, you must first define the notation of the probabilities.

  • P(A) — the probability that the stock price increases by 5%
  • P(B) — the probability that the CEO is replaced
  • P(A|B) — the probability of the stock price increases by 5% given that the CEO has been replaced
  • P(B|A) — the probability of the CEO replacement given the stock price has increased by 5%.

Using the Bayes’ theorem, we can find the required probability:

Thus, the probability that the shares of a company that replaces its CEO will grow by more than 5% is 6.67%.

Now that we know math behind Naive Bayes algorithm lets see how the algorithm works.

How Naive Bayes algorithm works?

Let’s understand it using an example. Below I have a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it.

Step 1: Convert the data set into a frequency table

Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction.

Problem: Players will play if weather is sunny. Is this statement is correct?

We can solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.

Step by Step implementation on Naive Bayes

Now, Let’s continue our Naive Bayes Blog and understand all the steps one by one. I,ve broken the whole process down into the following steps:

  • Handle Data
  • Summarize Data
  • Make Predictions
  • Evaluate Accuracy

Step 1: Handle Data

The first thing we need to do is load our data file. The data is in CSV format without a header line or any quotes. We can open the file with the open function and read the data lines using the reader function in the CSV module.

import csv
import math
import random
def loadCsv(filename):
lines = csv.reader(open(r'pima-indians-diabetes.data.csv'))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset

Now we need to split the data into training and testing dataset.

def splitDataset(dataset, splitRatio):
trainSize = int(len(dataset) * splitRatio)
trainSet = []
copy = list(dataset)
while len(trainSet) < trainSize:
index = random.randrange(len(copy))
trainSet.append(copy.pop(index))
return [trainSet, copy]

Step 2: Summarize the Data

The summary of the training data collected involves the mean and the standard deviation for each attribute, by class value. These are required when making predictions to calculate the probability of specific attribute values belonging to each class value.

We can break the preparation of this summary data down into the following sub-tasks:

# Separate Data By Classdef separateByClass(dataset):
separated = {}
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated# Calculate Meandef mean(numbers):
return sum(numbers)/float(len(numbers))
# Calculate Standard Deviationdef stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)# Summarize Datasetdef summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)]
del summaries[-1]
return summaries
# Summarize Attributes By Classdef summarizeByClass(dataset):
separated = separateByClass(dataset)
summaries = {}
for classValue, instances in separated.items():
summaries[classValue] = summarize(instances)
return summaries

Step 3: Making Predictions

We are now ready to make predictions using the summaries prepared from our training data. Making predictions involves calculating the probability that a given data instance belongs to each class, then selecting the class with the largest probability as the prediction. We need to perform the following tasks:

# Calculate Gaussian Probability Density Functiondef calculateProbability(x, mean, stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1/(math.sqrt(2*math.pi)*stdev))*exponent
# Calculate Class Probabilitiesdef calculateClassProbabilities(summaries, inputVector):
probabilities = {}
for classValue, classSummaries in summaries.items():
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i]
x = inputVector[i]
probabilities[classValue] *= calculateProbability(x, mean, stdev)
return probabilities
# Make a Predictiondef predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
# Make Predictionsdef getPredictions(summaries, testSet):
predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
# Get Accuracydef getAccuracy(testSet, predictions):
correct = 0
for x in range(len(testSet)):
if testSet[x][-1] == predictions[x]:
correct += 1
return (correct/float(len(testSet)))*100.0

Finally, we define our main function where we call all these methods we have defined, one by one to get the accuracy of the model we have created.

def main():
filename = 'pima-indians-diabetes.data.csv'
splitRatio = 0.67
dataset = loadCsv(filename)
trainingSet, testSet = splitDataset(dataset, splitRatio)
print('Split {0} rows into train = {1} and test = {2} rows'.format(len(dataset),len(trainingSet),len(testSet)))
#prepare model
summaries = summarizeByClass(trainingSet)
#test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracy: {0}%'.format(accuracy))
main()

Output might look like this

Split 768 rows into train = 514 and test = 254 rows Accuracy: 68.11023622047244%

Output is around 68% which is pretty good without using scikit learn library

Here’s the link to my Jupyter notebook where I have implemented same code. You can reference that as well. Also you can find the dataset used in same directory.

Applications of Naive Bayes Algorithms

  • Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real time.
  • Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable.
  • Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)
  • Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not.

What are the Pros and Cons of Naive Bayes?

Pros:

  • It is easy and fast to predict class of test data set. It also perform well in multi class prediction
  • When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.
  • It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

Cons:

  • If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.
  • On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_probab are not to be taken too seriously.
  • Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

So this is all from my side, I tried to provide all the important information on Naive Bayes algorithm with its implementation. I hope you will find something useful here. Thank you for reading till the end.

--

--