Aiconomy

Jailbreaking (AI)

Techniques for bypassing an AI model's safety filters and restrictions to produce outputs the model was designed to refuse, such as harmful instructions or policy-violating content.

Jailbreaking techniques have evolved from simple prompt manipulation to sophisticated multi-step attacks. Common approaches include role-playing scenarios, hypothetical framing, encoding harmful instructions, and iterative refinement. AI labs play a constant cat-and-mouse game: each safety patch is met with new jailbreaking techniques. Red teaming exercises at DEF CON 2023 found that most frontier models could be jailbroken within minutes. The difficulty of preventing jailbreaks while maintaining model usefulness is a fundamental tension in AI safety. Research into robust safety training aims to make jailbreaking progressively harder.

Live Data

1,373AI Safety Incidents This Year

Explore the Data

AI Economy Pulse

Every Friday: the 3 AI data points that actually matter this week. Free, forever.

Built on data from Stanford HAI, IEA, OECD & IMF

Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”

No spam, ever. Unsubscribe anytime.