I plan on deriving a solution to this problem by gradient descent.By, this method of solving, we will be able to find not just the shortest time path, rather, we can get any curve such that, the time taken will be as close as possible to any time value we want.
To elaborate, if we want to find a path in the vertical plane such that the time taken by a friction-less bead traveling on the required path is exactly 1.75 times the time taken by the same friction-less bead traveling on a straight line, we can do that. All this is made possible by using gradient descent, backbone and primary working principle of all types of Neural Networks.
This method of a traversing a function in a given domain is very powerful, and it allows us to arrive at a solution without using integration or the Euler-Lagrange equation (which itself is derived after some heavy mathematics), i.e. we can arrive at a numerical solution for any path, given the amount of time taken, without ever having to integrate.Another way this method is very advantageous is it can quite easily take care of finding optimal solutions upon functions which are not integrable or are quite hard to integrate.
What Can this Method accomplish?
Let me just show you just how powerful this method is, ahead of time.
The path of shortest time, exact cycloid curve joining start and end points.
Paths with customizable Times of descent:
Brachistochrone Problem Mathematical Solution:
Before I use the numerical gradient descent method to solve the Bracistochrone problem, I will first solve the problem analytically, i.e. using Integration and the Euler-Lagrange Equation.The theoretical solution thus obtained is the cycloidal curve connecting the 2 points as shown below.
The Euler-Lagrange Equation, can be referred to from here, in case you want to know more. If you have not understood about the Euler Lagrange equation, it is fine.I have just shown this short theoretical proof as an extra to show that the actual shortest path distance is going to be that of a cycloidal curve connecting the start and end point.
Visually: Note the times of descent for each type of path.
So, above, I have shown the parametric solution to the brachistochrone problem obtained from the conventional mathematics.
Brachistochrone Problem Solution with Gradient Descent:
The method that I present here, involves gradient descent, which is a way to navigate around in a differentiable function. In a nut shell, what gradient descent allows us to do is navigate around in a function in a intelligent manner, so that you are able to arrive at any value you want to on a function. If the value we want to go to is not available as an output of the function, i.e. the value we want is out of the range of the function, then we will arrive at a value of the function, that is closest to the given value.
In our case, the function we want to navigate around is the time taken by a ball to move along a path under the influence of just gravity with fixed initial and final points.
Any curve on the XY plane is characterized by the y-values it has for each x-value on x-axis.
In this method, I divide the X- axis between start point and end point into 50 or 100 points of equal distance amongst them, and then Y values for each point in x-axis is arrived at or calculated so that the time of travel is minimal.These (Xi,Yi)’s decide what the curve is on the X,Y plane.
So, we have 50 very small straight lines in a very small x axis gap of 1 unit and y axis gap of 0.5 units.i.e. my initial and final points are: (0,0.5),(1,0). Note, if we want to find the path between points that are very distant say between (100,100) and (200,0), then we will need to take more than 50 lines attaching in different angles between them, i.e. we will need to divide the x axis into more than 50 points, say 5000 or higher, to obtain a good resolution and approximation of the curve into many small lines. The point is that the curve will be better approximated only if we have a huge number of very small straight lines as compared to the distance to be traversed.
Now, that we have setup the optimization problem, it is now time to define the variables and function.The variables here are the yvalues for each value of x.The time taken by a body to go down a path is decided by these y values. The total time taken will be the sum of times taken by a body to go from x1,y1 to x2,y2 and x2,y2 to x3,y3 and x3,y3 to x4,y4 and so on.
X1,y1 to x2,y2 is a straight line and then x2,y2 to x3,y3 is a straight line and so on. Due to conservation of energy, the velocity of the body at a position y is given by :
The python implementation of the function is shown below.Basically, we have a set of y values for corresponding x values and based on these points, we calculate the total time of traversal for a friction-less body to pass all the said points in order in gravity.
The above method is used to calculate the total time taken by the body to go from initial fixed point to final fixed point via a curve, which is characterized by the yk’s.
The Working of Gradient Descent:
Thus, we now have a target function (i.e. the time taken for the whole journey) as a function of variables yk’s, where k goes from 0 to 50(in our case, sometimes, 1 unit length is divided into 100 points for better resolution). Now, all gradient descent does is find the derivative of the total time function wrt each yk, i.e. for each small change in yk, we have change in total time of traversal and the ratio of change in total time to change in that particular yk is taken as gradient of time wrt yk.
This calculation of gradient is done for all 50 of the yks and for each iteration, the gradient of time function wrt yk, multiplied by a learning rate, is subtracted from initial yk. This process is done for all yk’s within one iteration.
Now, the time function and gradients are all calculated wrt the new yks and the new gradients after multiplication with a learning rate is subtracted from yks (obtained at the end of the 1st iteration), to obtain yks after the end of second iteration. This keeps on happening till we stop.
After, each iteration, if you have set hyper parameters right, the time value should decrease till the minimum value is obtained.In our case, I had to go to 20000 iterations, to finally come about the minimal value of the time function and find the corresponding yks that create this value of total time of traversal.
The learning rate, number of iterations are all hyper parameters, which vary on the basis of the tolerable error, the range of X,Y values and the number of yks themselves.
This above procedure, known as gradient descent arrives successfully at the minimal value of the function.
The code implementation of gradient descent is shown below:
The final path this gradient descent converges upon after starting from a straight line connecting the fixed initial and final points is shown below:
So, till now, we have successfully, solved the brachistochrone problem, i.e. found the path of shortest time.
Now, let us not stop here, let us try to find paths that have time of descent as a given value.
Brachistochrone Problem with Variations:
This can be accomplished as follows:
Now, say we want to find out the path that has not minimum time of traversal, rather the path that takes a specific amount of time (set by us), then after calculating the total time of traversal, we need to find mean squared error between the found time and the time required. This mean squared error should then be used for gradient descent, i.e. this mean squared error should be used for calculating yk’s gradients and their subsequent updating.
This leads to the gradient descent minimizing the mean squared error between total time of traversal and the time required, rather than the total time of traversal as was done in the last case, this leads to the gradient descent minimization of the mean squared error between total time of traversal and required time.
The yks obtained at the minimal value of the error is the path that leads to a body descending with time of traversal as close to the time required as possible. This closeness is characterized by the minimal value of the mean squared error. The smaller the final mean squared error, the closer is the path thus obtained is nearer to the required path(i.e. the one that takes exactly treq amount of time).
Now, say, I want to find out the path for which the time of descent is say, 0.587 seconds.
The following code accomplishes it:
The time of descent and path obtained is shown below:
Let us try for path with time=0.7, note that the time of descent in case of straight path is 0.714, as shown in the diagram previously.
Now, a path worse than straight line in terms of time taken, time=1.17 seconds.
So, now, we have solved not only the age old brachistochrone problem with simple gradient descent, we have done much more on it.
Isn’t is fascinating how much we can do with simple Gradient Descent, that was not all that easy with pure conventional math.
Do leave a clap if you like this kind of work.
Thank You for your attention.Till Next time, stay safe and cheers.