## Logic behind Softmax regression

Ultimately, the algorithm is going to find a boundary line for each class. Something like the image below (but not actually the image below):

Note: we as humans can easily eyeball the chart and categorize Sarah as waitlisted, but let’s let the machine figure it out via machine learning yeah?

Just like in linear and logistic regressions, we want the output of the model to be as close as possible to the actual label. Any difference between the label and output will contribute to the “loss” of the function. The model learns via minimizing this loss.

There are 3 classes in this example, so the label of our data, along with the output, are going to be vectors of 3 values. Each value associated with an admission status.

If the label is such that:

`admitted = [1, 0, 0]`

waitlisted = [0, 1, 0]

rejected = [0, 0, 1]

then the output vector will mean:

`[probability of being admitted,`

probability of being waitlisted,

probability of being rejected]

Thus, in softmax regression, we want to find a probability distribution over all the classes for each datapoint.

We use the softmax function to find this probability distribution: