Jailbreaking (AI)
Techniques for bypassing an AI model's safety filters and restrictions to produce outputs the model was designed to refuse, such as harmful instructions or policy-violating content.
Jailbreaking techniques have evolved from simple prompt manipulation to sophisticated multi-step attacks. Common approaches include role-playing scenarios, hypothetical framing, encoding harmful instructions, and iterative refinement. AI labs play a constant cat-and-mouse game: each safety patch is met with new jailbreaking techniques. Red teaming exercises at DEF CON 2023 found that most frontier models could be jailbroken within minutes. The difficulty of preventing jailbreaks while maintaining model usefulness is a fundamental tension in AI safety. Research into robust safety training aims to make jailbreaking progressively harder.
Live Data
Related Terms
Artificial General Intelligence (AGI)
A hypothetical form of AI that can understand, learn, and apply knowledge across any intellectual task at or above human level, rather than being specialized for specific tasks.
AI Alignment
The research field focused on ensuring AI systems behave in accordance with human values and intentions, particularly as systems become more capable.
AI Safety
The interdisciplinary field focused on preventing AI systems from causing harm, encompassing alignment, robustness, interpretability, and governance of AI technologies.
ChatGPT
OpenAI's conversational AI assistant, launched in November 2022, which catalyzed the current generative AI boom by demonstrating the capabilities of large language models to a mainstream audience.
Deepfake
AI-generated synthetic media — images, video, or audio — that realistically depict events or statements that never occurred, created using deep learning techniques.
Fine-Tuning
The process of further training a pre-trained AI model on a specific, smaller dataset to specialize it for a particular task or domain, requiring far less compute than training from scratch.
AI Economy Pulse
Every Friday: the 3 AI data points that actually matter this week. Free, forever.
Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”
No spam, ever. Unsubscribe anytime.