Aiconomy

Inference Cost

The computational expense of running a trained AI model to generate outputs for users, which determines the per-query economics and ultimately the pricing of AI services.

Inference costs have dropped 280x in 18 months, from approximately $0.36 per GPT-4-equivalent query to under $0.01 for many competitive models. Despite falling per-unit costs, total inference spending is rising as query volumes explode. Techniques like quantization (4-bit reduces costs 4-8x), speculative decoding, and MoE architectures reduce inference compute. Inference accounts for approximately 60% of total AI compute demand — the majority of AI's energy footprint. Specialized inference chips from Groq and AWS (Inferentia) offer 2-5x cost advantages over general-purpose GPUs.

Live Data

141.886671 TWhAI Energy Consumed Today

Explore the Data

AI Economy Pulse

Every Friday: the 3 AI data points that actually matter this week. Free, forever.

Built on data from Stanford HAI, IEA, OECD & IMF

Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”

No spam, ever. Unsubscribe anytime.