WTH is Gradient descent?

In simpler terms, gradient descent is a method of finding the (global minimum) minimum value of a function (in the context of Neural networks, it is the Cost Function) by taking small steps in the direction of the steepest slope of the function. It is commonly used in machine learning and artificial intelligence to optimize the parameters of a model to achieve better performance on a given task.

What does "gradient descent" literally mean?

Geometrically, the gradient of a function at a point represents the direction of the steepest slope or highest rate of change of the function at that point. The magnitude of the gradient represents the steepness of the slope and the sign represents the direction either upwards (ascent) or downwards (descent).

"Descent" refers to the process of moving downwards, or in the opposite direction of ascent.

Therefore, "gradient descent" literally means the process of moving downwards in the direction of the negative gradient of a function.

In the context of optimization, the goal of gradient descent is to find the minimum value of a function by iteratively adjusting its parameters in the direction of the negative gradient until the minimum is reached. This process is analogous to descending a hill by following the steepest slope downwards.

gradient == magnitude (the amount of change in the function)

descent == direction (upwards/downwards) == +ve/-ve

In the context of NN, we always look for the highest value (gradient) in a downward direction (descent), in other words for which input (combination of weight values) we are getting the highest rate of change in the loss value. and we call this function as cost function.

so gradient or gradient_descent value for the cost function is nothing but output value of the cost function ie, the loss value of the network or "the highest value in the downward direction"