Skip to main content
Aiconomy

Inference Cost

The computational expense of running a trained AI model to generate outputs for users, which determines the per-query economics and ultimately the pricing of AI services.

Inference costs have dropped 280x in 18 months, from approximately $0.36 per GPT-4-equivalent query to under $0.01 for many competitive models. Despite falling per-unit costs, total inference spending is rising as query volumes explode. Techniques like quantization (4-bit reduces costs 4-8x), speculative decoding, and MoE architectures reduce inference compute. Inference accounts for approximately 60% of total AI compute demand — the majority of AI's energy footprint. Specialized inference chips from Groq and AWS (Inferentia) offer 2-5x cost advantages over general-purpose GPUs.

Live Data

142.905273 TWhAI Energy Consumed Today

Explore the Data

AI Economy Pulse

Every Friday: 3 data points shaping the AI economy this week. Cited sources. No fluff.

Data cited to: Stanford HAI, IEA, OECD, IMF

Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”

Weekly. Unsubscribe in one click.