## A Naive Bayes classifier is a supervised learning classifier that uses Bayes’ theorem to build the model

A classifier solves the problem of identifying sub-populations of individuals with certain features in a larger set, with the possible use of a subset of individuals known as a priori (a training set).

- The underlying principle of a Bayesian classifier is that some individuals belong to a class of interest with a given probability based on some observations.
- This probability is based on the assumption that the characteristics observed can be either dependent or independent from one another; in this second case, the Bayesian classifier is called Naive because it assumes that the presence or absence of a particular characteristic in a given class of interest is not related to the presence or absence of other characteristics, greatly simplifying the calculation.

Let’s see how to build a Naive Bayes classifier:

- Let’s import some libraries:

`import numpy as np`

import matplotlib.pyplot as plt

from sklearn.naive_bayes import GaussianNB

2. You were provided with a data_multivar.txt file. This contains data that we

will use here. This contains comma-separated numerical data in each line. Let’s load the data from this file:

input_file = 'data_multivar.txt'

X = []

y = []with open(input_file, 'r') as f:

for line in f.readlines():

data = [float(x) for x in line.split(',')]

X.append(data[:-1])

y.append(data[-1])

X = np.array(X)

y = np.array(y)

*We have now loaded the input data into X and the labels into y . There are four labels: 0, 1, 2, and 3.*

3. Let’s build the Naive Bayes classifier:

`classifier_gaussiannb = GaussianNB()`

classifier_gaussiannb.fit(X, y)

y_pred = classifier_gaussiannb.predict(X)

**The gauusiannb function specifies the Gaussian Naive Bayes model.**

4. Let’s compute the accuracy measure of the classifier:

`accuracy = 100.0 * (y == y_pred).sum() / X.shape[0]`

print("Accuracy of the classifier =", round(accuracy, 2), "%")

The following accuracy is returned:*Accuracy of the classifier = 99.5 %*

5. Let’s plot the data and the boundaries same as we did in Building a logistic regression classifier:

x_min, x_max = min(X[:, 0]) - 1.0, max(X[:, 0]) + 1.0

y_min, y_max = min(X[:, 1]) - 1.0, max(X[:, 1]) + 1.0# denotes the step size that will be used in the mesh grid

step_size = 0.01# define the mesh grid

x_values, y_values = np.meshgrid(np.arange(x_min, x_max, step_size), np.arange(y_min, y_max, step_size))# compute the classifier output

mesh_output = classifier_gaussiannb.predict(np.c_[x_values.ravel(), y_values.ravel()])# reshape the array

mesh_output = mesh_output.reshape(x_values.shape)# Plot the output using a colored plot

plt.figure()# choose a color scheme

plt.pcolormesh(x_values, y_values, mesh_output, cmap=plt.cm.gray)# Overlay the training points on the plot

plt.scatter(X[:, 0], X[:, 1], c=y, s=80, edgecolors='black',

linewidth=1, cmap=plt.cm.Paired)# specify the boundaries of the figure

plt.xlim(x_values.min(), x_values.max())

plt.ylim(y_values.min(), y_values.max())# specify the ticks on the X and Y axes

plt.xticks((np.arange(int(min(X[:, 0])-1), int(max(X[:, 0])+1), 1.0)))

plt.yticks((np.arange(int(min(X[:, 1])-1), int(max(X[:, 1])+1), 1.0)))

plt.show()