Aiconomy

Masked Language Model

A pre-training approach where the model learns to predict randomly hidden (masked) words in a sentence, building deep understanding of language structure and semantics.

Masked language modeling is the training objective behind BERT (Bidirectional Encoder Representations from Transformers), introduced by Google in 2018. During training, 15% of tokens are randomly masked and the model predicts them from context. This bidirectional approach — reading both left and right context — gave BERT breakthrough performance on NLP benchmarks, improving state-of-the-art results on 11 tasks simultaneously. The technique remains the foundation of encoder-based models used for classification, search, and information extraction.

Explore the Data

AI Economy Pulse

Every Friday: the 3 AI data points that actually matter this week. Free, forever.

Built on data from Stanford HAI, IEA, OECD & IMF

Latest: “AI Investment Hits $42B in Q1 2026 — Here's Where It Went”

No spam, ever. Unsubscribe anytime.