Skip to main content
Aiconomy

Model Collapse

A degradation phenomenon where AI models trained on data generated by other AI models progressively lose quality and diversity, potentially threatening the long-term viability of training on internet data.

Research published in Nature in 2024 demonstrated that training language models recursively on AI-generated text leads to progressive quality degradation, with outputs converging to a narrow distribution. As AI-generated content floods the internet — with estimates suggesting 50%+ of web content could be AI-generated by 2025 — the risk of model collapse increases for future models trained on web-scraped data. This has intensified the value of pre-AI training data and curated human-created datasets. Solutions under exploration include data provenance tracking, filtering AI-generated content from training data, and synthetic data quality controls.

Explore the Data

AI Economy Pulse

Every Friday: 3 data points shaping the AI economy this week. Cited sources. No fluff.

Data cited to: Stanford HAI, IEA, OECD, IMF

Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”

Weekly. Unsubscribe in one click.