Gradient Descent

Gradient Descent is the core optimization algorithm used to train neural networks. It consists of iteratively updating the weights by a small amount (known as the "learning rate") in the direction opposite of the gradient of the loss function at the current location. Depending on the size of the training set, it may be impractical to define the loss function using the full training set, thus at each iteration it is defined based on a small batch (the "batch size") of samples. In such case, gradient descent is qualified as "stochastic gradient descent".