## Training an adaline network

The reasoning behind the training process is still the same as the perceptron: each weight ** w(i,j)** needs to be updated in such a way that it will increase the amount of correctly predicted outputs on the next iteration — we call this update value Δ

**(**. However, this update variable is calculated in a different way, using an algorithm known as the gradient descent.

*i,j)*## Gradient Descent

Gradient descent is an algorithm that is used to find the set of weights ** W** that minimises the overall network predictions’ error. To achieve that, we define what is called an error function (also known as loss function or cost function)

**, and we iteratively try to find the global minimum of this function.**

*J(W)*Suppose there is a plot where we can describe how the error function

**varies based on a weight value, something like this:**

*J(W)*In the figure above we currently have an error of roughly 20 when ** w** is 9. However, one can see that the network’s minimum error is achieved when

**is 5. The challenge that gradient descent tackles is how to get from the current point to the weight value that produces the lowest error.**

*w*The way gradient descent finds its way to the minimum is (using the right mathematical jargon) by computing the partial derivative in respect to the weights. In layman’s terms, the intuition behind gradient descent is illustrated in the animation below: