Aiconomy

Reinforcement Learning from Human Feedback (RLHF)

A technique for aligning AI models with human preferences by having human evaluators rank model outputs, then using those rankings as a reward signal to improve the model's behavior.

RLHF is the primary technique used to make large language models helpful, harmless, and honest. It was instrumental in making ChatGPT conversational and useful. The technique requires significant human labor for evaluation, creating new job categories like AI trainers. As AI systems become more capable, scaling RLHF and developing more efficient alignment techniques is a major focus of AI safety research, which receives less than 1% of total AI R&D spending.

AI Economy Pulse

Weekly AI economy data in your inbox. Free forever.

Join 2,500+ subscribers

Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”

No spam, ever. Unsubscribe anytime.