Skip to main content
Aiconomy

Gradient Descent

The primary optimization algorithm used to train neural networks, which iteratively adjusts model parameters in the direction that most reduces the prediction error.

Gradient descent and its variants (SGD, Adam, AdamW) are the workhorses of deep learning optimization. Stochastic gradient descent (SGD) processes random mini-batches of data rather than the entire dataset, making it practical for large-scale training. The Adam optimizer, introduced in 2015, adapts learning rates per parameter and is the default choice for training most modern neural networks. Training frontier models involves performing gradient descent across trillions of tokens using thousands of GPUs in parallel.

Explore the Data

AI Economy Pulse

Every Friday: 3 data points shaping the AI economy this week. Cited sources. No fluff.

Data cited to: Stanford HAI, IEA, OECD, IMF

Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”

Weekly. Unsubscribe in one click.