What is Gradient Descent?

Question

Accepted Answer

The primary optimization algorithm used to train neural networks, which iteratively adjusts model parameters in the direction that most reduces the prediction error. Gradient descent and its variants (SGD, Adam, AdamW) are the workhorses of deep learning optimization. Stochastic gradient descent (SGD) processes random mini-batches of data rather than the entire dataset, making it practical for large-scale training. The Adam optimizer, introduced in 2015, adapts learning rates per parameter and is the default choice for training most modern neural networks. Training frontier models involves performing gradient descent across trillions of tokens using thousands of GPUs in parallel.

Gradient Descent

Explore the Data

Related Terms

Artificial General Intelligence (AGI)

AI Alignment

AI Compute

Capex (Capital Expenditure)

Data Center

Fine-Tuning

AI Economy Pulse