*Deep Learning, which is based on the Multilayer Neural Networks has achieved state-of-the-art results in most of the domains as of today. In this post, we will look at the Universal Approximation Theorem — one of the fundamental theorems on which the entire concept of Deep Learning is based upon. We will make use of lego blocks analogy and illustrations to understand the same.*

“Neural Networks have an excellent Representation Power of Functions and a Feed Forward Neural Net with one hidden layer with finite number of neurons can represent any continuous function”

In order to make sense of it, let us split the theorem into multiple chunks —

👉 Representation Power of Functions

👉 Feed Forward Neural Net

👉 Hidden layer with a finite number of neurons

�� Represent any continuous function

- A function defines the relationship between a set of inputs and corresponding outputs. Functions can be both continuous or discrete in nature.
- Functions are widely used in Machine Learning concepts such as Loss Functions. In fact, the entire Machine Learning Model can be considered as a function that takes in inputs and provides output.
- Our objective in ML is to minimize the loss functions and we do this by taking the gradient, i.e the first-order derivative of the loss function with respect to the input parameters such as weights and biases.
- Hence differentiability is an important consideration in Machine Learning, we always prefer continuous functions mainly because of their differentiable-at-all-points attribute.
- Since, we compute the gradient, which is a first-order derivative to minimize the loss function, we always prefer continuous functions.

Now let us move one step ahead and understand why do we really need complex functions and why we need something like Neural Network to approximate the same.

- We have several types of relationships in real life namely linear functions, quadratic functions, non-linear functions, etc.
Function that models a linear relationship for e.g the amount of fuel used has a linear relationship with the distance traveled.*Linear function:*Function that models quadratic relationship for e.g the path traced by a golf ball during the shot exhibits quadratic relationship.*Quadratic function:*Let us look at few examples of non-linear relationships. For e.g, a drug medication provided to a patient may not display desirable outcomes until a threshold level; triangulation of signals from the GPS satellites are non-linear in nature.*Non-linear function:*- Hence, in order to model such complex relationships, a simple linear or quadratic function may not be the most appropriate one.
- Therefore, we must require Complex functions to model complex non-linear relationships.

This is where the need to approximate any complex functions and model non-linearity comes into the picture. Now, let us look at how Neural Networks can help on the same.

In a traditional programming scenario, where the input and the guiding rules are programmed, resulting in output.

In Machine Learning, we feed the input and desired output into the Machine Learning Model, which then ‘learns’ the function and provides the set of rules (also known as function or model).