What is Mixture of Experts (MoE)?

Question

Accepted Answer

A neural network architecture that routes each input to only a subset of specialized 'expert' sub-networks, enabling much larger models without proportionally increasing compute costs. MoE architectures allow models with trillions of total parameters while only activating a fraction for each input. GPT-4 is widely reported to use a MoE architecture with multiple expert networks. Mistral's Mixtral 8x7B model activates only 2 of 8 experts per token, achieving performance comparable to models 3x its active size. Google's Switch Transformer scaled to 1.6 trillion parameters using MoE. The approach is key to reducing inference costs — a critical factor as AI scales to billions of daily queries.

Mixture of Experts (MoE)

Explore the Data

Related Terms

Artificial General Intelligence (AGI)

AI Alignment

AI Compute

Capex (Capital Expenditure)

ChatGPT

Data Center

AI Economy Pulse