Aiconomy

CLIP

OpenAI's Contrastive Language-Image Pre-training model that learns to connect images and text descriptions, enabling zero-shot image classification and powering text-to-image generation systems.

CLIP, released in 2021, was trained on 400 million image-text pairs from the internet. It can classify images into any category described in natural language without task-specific training. CLIP achieved competitive accuracy with fully supervised models on ImageNet while being far more flexible. The model serves as the text encoder in Stable Diffusion and DALL-E, translating text prompts into representations that guide image generation. CLIP demonstrated that scaling data diversity could be more powerful than scaling model size alone.

Explore the Data

AI Economy Pulse

Every Friday: the 3 AI data points that actually matter this week. Free, forever.

Built on data from Stanford HAI, IEA, OECD & IMF

Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”

No spam, ever. Unsubscribe anytime.