Pre-Training
The initial phase of training an AI model on a large, general-purpose dataset to learn broad knowledge and patterns before it is fine-tuned for specific tasks.
Pre-training is the most expensive phase of building large AI models. GPT-4's pre-training reportedly cost $78-191 million in compute alone, processing trillions of tokens from books, websites, and code. The pre-training paradigm, popularized by BERT (2018) and GPT-2 (2019), enables transfer learning — training once, then adapting cheaply to many tasks. Pre-training datasets have grown from millions of documents to trillions of tokens. The quality and composition of pre-training data is now recognized as equally important as model architecture and scale.
Live Data
Explore the Data
Related Terms
Artificial General Intelligence (AGI)
A hypothetical form of AI that can understand, learn, and apply knowledge across any intellectual task at or above human level, rather than being specialized for specific tasks.
AI Alignment
The research field focused on ensuring AI systems behave in accordance with human values and intentions, particularly as systems become more capable.
AI Compute
The computational resources — primarily GPU and TPU processing power — required to train and run AI models, typically measured in FLOP (floating-point operations) or GPU-hours.
Capex (Capital Expenditure)
Long-term investment spending by companies on physical assets like data centers, GPU clusters, and networking infrastructure — the backbone of AI deployment at scale.
ChatGPT
OpenAI's conversational AI assistant, launched in November 2022, which catalyzed the current generative AI boom by demonstrating the capabilities of large language models to a mainstream audience.
Data Center
A facility housing computer systems and infrastructure used to process, store, and distribute data — increasingly built specifically for AI training and inference workloads.
AI Economy Pulse
Every Friday: the 3 AI data points that actually matter this week. Free, forever.
Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”
No spam, ever. Unsubscribe anytime.