• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

Gradient descent & its DNN relatives — A short story

February 19, 2021 by systems

Vigneshwaran D

Note: This is just a basic high-level explanation, it might not cover the topics in detail.

Imagine you are in a car on top of a hill, your objective is to go down (reach the plains, aka the global minima). So what do you do?

You use your 360 degree camera and see which path is the steepest — why? because you want to climb down as fast as possible (your GF/BF just called). So how do you calculate the steepest point now? — by calculating the slope, aka the “gradient”, and since you want to go down, it is “descent”. Hence the name “Gradient descent”.

Okay, now you have chosen your path (the steepest). So what next? You have to drive your car. Sadly your car drives a fixed distance when you push the acceleration pedal, but luckily you can set the number of km at the start.

However, there’s a catch, if you set the no. of km very high, your every step will be large, and since your car will go in that path straight and won’t stop in-between, there’s a chance that you might again climb uphill. Hence the number of km your car covers should be optimal! (If it’s too big, you may go uphill also, and if it’s too small it will take you forever to reach the plains). This number of km is called “Learning rate or Step size”

After each optimal step, you recalculate the next steep (gradient) and you repeat the process.

You have a 360 degree camera? Yes! but you are on a hill, so there will be lot of bushes, trees, rocks which might hinder your vision

What do you ideally do in such a situation? You take a couple of steps in one random direction (“Initialization”) and check if we are going downhill or not, and if a particular path seems promising, you stick to it else you’ll change the direction.

Now let’s add some spice to the story, its pitch dark and there are some dangerous animals out there. Hence you cannot step outside and check if the path you picked is making sense or not. But no worries, this is not directed by Robert B. Weide. Your car has an altimeter — it will tell you, your current altitude. If the altitude decreases as you move forward, you can stick to the current path, else you need to change it! Pretty simple. Here, altimeter is the “Cost function”

Let’s say you are on a path. You realize the altitude aka “Cost function” isn’t changing much, you now want to try a different path, and how do you change your path? by changing the direction (North, East, West and South) using your steering wheel. I know, pretty obvious right?

Here, North, East, West & South are your “Input variables”

But hey, you don’t need to steer. Your car is much better than a Tesla, it will analyze the cost function and steer it for you. You can probably open a can and chill

However, for the sake of understanding the concept lets proceed anyway

Your car is autonomous and it has furcated the input variables into multiple simpler layers (for the car of course, for us its complex :/ ) to precisely operate the car. These layers are called “Hidden Layers”.

E.g. NE, NW, SE, SW (1st layer)

Mostly N + slightly E, Mostly E + slightly N, etc., (2nd layer)

Mostly N + slightly E also curved, etc., (3rd layer)

And many more layers like that. The items in each layer is called a “Node”

Now back to the story, the car passes the feedback from the Altimeter (Cost function), to the multiple layer (Hidden layers) so that the car can change the direction based on the feedback. This can be achieved by altering the importance (“Weights”) of each Node.

Since this feedback passes from last layer to the front layer, by the time it reaches the front layer the other layers would’ve taken all the feedback points leaving the front layer with very less points

Here, the process of going from the last layer to the first layer is called “Back propagation”

And phenomenon of front layers having less feedback points is called the “Vanishing gradient problem”. This only happens if the hidden layers are large in number

This Vanishing gradient problem can be addressed through Residual networks & ReLU, which is a story for another day 🙂

PS: This is my first attempt at writing, let me know your thoughts/inputs in the comment section. Thanks

Reference (source of inspiration):

Gradient Descent: Simply Explained? by Koo Ping Shung

Link: https://towardsdatascience.com/gradient-descent-simply-explained-1d2baa65c757

Filed Under: Machine Learning

Primary Sidebar

website design carmel

Website Design in Carmel: Building an Online Presence That Works

Carmel WordPress Help

Carmel WordPress Help: Expert Support to Keep Your Website Running Smoothly

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy