Reinforcement Learning from Human Feedback (RLHF)
A technique for aligning AI models with human preferences by having human evaluators rank model outputs, then using those rankings as a reward signal to improve the model's behavior.
RLHF is the primary technique used to make large language models helpful, harmless, and honest. It was instrumental in making ChatGPT conversational and useful. The technique requires significant human labor for evaluation, creating new job categories like AI trainers. As AI systems become more capable, scaling RLHF and developing more efficient alignment techniques is a major focus of AI safety research, which receives less than 1% of total AI R&D spending.
Explore the Data
Related Terms
Artificial General Intelligence (AGI)
A hypothetical form of AI that can understand, learn, and apply knowledge across any intellectual task at or above human level, rather than being specialized for specific tasks.
AI Alignment
The research field focused on ensuring AI systems behave in accordance with human values and intentions, particularly as systems become more capable.
AI Safety
The interdisciplinary field focused on preventing AI systems from causing harm, encompassing alignment, robustness, interpretability, and governance of AI technologies.
ChatGPT
OpenAI's conversational AI assistant, launched in November 2022, which catalyzed the current generative AI boom by demonstrating the capabilities of large language models to a mainstream audience.
Deepfake
AI-generated synthetic media — images, video, or audio — that realistically depict events or statements that never occurred, created using deep learning techniques.
Fine-Tuning
The process of further training a pre-trained AI model on a specific, smaller dataset to specialize it for a particular task or domain, requiring far less compute than training from scratch.
AI Economy Pulse
Weekly AI economy data in your inbox. Free forever.
Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”
No spam, ever. Unsubscribe anytime.