Aiconomy

Gradient Descent

The primary optimization algorithm used to train neural networks, which iteratively adjusts model parameters in the direction that most reduces the prediction error.

Gradient descent and its variants (SGD, Adam, AdamW) are the workhorses of deep learning optimization. Stochastic gradient descent (SGD) processes random mini-batches of data rather than the entire dataset, making it practical for large-scale training. The Adam optimizer, introduced in 2015, adapts learning rates per parameter and is the default choice for training most modern neural networks. Training frontier models involves performing gradient descent across trillions of tokens using thousands of GPUs in parallel.

Explore the Data

AI Economy Pulse

Every Friday: the 3 AI data points that actually matter this week. Free, forever.

Built on data from Stanford HAI, IEA, OECD & IMF

Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”

No spam, ever. Unsubscribe anytime.