What is Constitutional AI?

Question

Accepted Answer

An AI alignment technique developed by Anthropic where models are trained to follow a set of explicit principles (a 'constitution') rather than relying solely on human feedback for every decision. Constitutional AI (CAI) addresses limitations of RLHF by providing models with a written set of principles to guide their behavior. The model critiques and revises its own outputs based on these principles, reducing the need for extensive human labeling. Anthropic's Claude models are trained using CAI, with principles covering helpfulness, harmlessness, and honesty. The approach is more scalable than RLHF because it reduces dependence on human evaluators. CAI has influenced the broader AI safety field and represents a promising direction for aligning AI systems as they become more capable.

Constitutional AI

Explore the Data

Related Terms

Artificial General Intelligence (AGI)

AI Alignment

AI Safety

ChatGPT

Deepfake

Fine-Tuning

AI Economy Pulse