Exploring the metric functions used in K Nearest Neighbor (KNN) model

## I**ntroduction**

If you’re familiar with some of the basic machine learning algorithms that are used in the field, then you’ve probably heard of the k-nearest neighbors algorithm, or KNN. It is the algorithm companies like Netflix or Spotify use in order to recommend different movies to watch or songs to listen to.

The **concept of finding nearest neighbors **may be defined as “**the process of finding the closest point to the input point from the given data set**”. The algorithm stores all the available cases (test data) and classifies new cases by majority votes of its k neighbors. When implementing KNN, the first step is to transform data points into their mathematical values (vectors). The algorithm works by finding the distance between the mathematical values of these points. It computes the distance between each data point and the test data and then finds the probability of the points being similar to the test data. Classification is based on which points share the highest probabilities. The distance function can be Euclidean, Minkowski or the Hamming distance.

**Euclidean Distance Function**

Euclidean distance can simply be defined as the shortest between the 2 points irrespective of the dimensions. The most common way to find the distance between is the Euclidean distance. According to the Euclidean distance formula, the distance between two points in the plane with coordinates (x, y) and (a, b) is given by

dist((x, y), (a, b)) = √(x — a)² + (y — b)²

To visualize this formula, it would look something like this:

For a given value of K the algorithm will find the K nearest neighbors of the data point and then it will assign the class to the data point by having the class which has the highest number of data points out of all classes of the K neighbors.

After computing the distance, the input x gets assigned to the class with the largest probability.