Transformer Architecture
The neural network architecture introduced by Google in 2017 that uses self-attention mechanisms to process sequences in parallel, enabling the large language models that power modern AI.
The transformer architecture is the foundation of virtually all modern large language models, including GPT, Claude, Gemini, and Llama. Its ability to process text in parallel (rather than sequentially) enabled scaling to trillion-parameter models. Transformers have also been adapted for image generation (diffusion models), protein structure prediction, and other domains. The architecture's hunger for compute is a key driver of the 4.2x annual growth in AI training compute.
Explore the Data
Related Terms
Artificial General Intelligence (AGI)
A hypothetical form of AI that can understand, learn, and apply knowledge across any intellectual task at or above human level, rather than being specialized for specific tasks.
AI Alignment
The research field focused on ensuring AI systems behave in accordance with human values and intentions, particularly as systems become more capable.
AI Compute
The computational resources — primarily GPU and TPU processing power — required to train and run AI models, typically measured in FLOP (floating-point operations) or GPU-hours.
Capex (Capital Expenditure)
Long-term investment spending by companies on physical assets like data centers, GPU clusters, and networking infrastructure — the backbone of AI deployment at scale.
ChatGPT
OpenAI's conversational AI assistant, launched in November 2022, which catalyzed the current generative AI boom by demonstrating the capabilities of large language models to a mainstream audience.
Data Center
A facility housing computer systems and infrastructure used to process, store, and distribute data — increasingly built specifically for AI training and inference workloads.
AI Economy Pulse
Weekly AI economy data in your inbox. Free forever.
Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”
No spam, ever. Unsubscribe anytime.