AIAny - The Bitter Lesson

Every few years the same drama replays: a team hand-codes deep domain knowledge, wins the early benchmark, then gets buried by a cruder method that simply scales with compute. Rich Sutton's essay names that pattern and explains why it keeps catching researchers off guard — the win is "bitter" precisely because it beats the human-centric approach we are emotionally invested in.

Core Argument

Sutton reads 70 years of AI as one repeated experiment:

Search and learning are the only methods that scale arbitrarily with computation — everything else plateaus once compute grows.
Building in human knowledge pays off short-term, then caps progress — it complicates methods in ways that resist general scaling.
The engine is Moore's law: researchers optimize as if compute were fixed, but it inevitably explodes, rewarding whoever bet on general methods.
The evidence is a string of reversals — deep search over chess heuristics, AlphaGo's self-play over hand-built Go knowledge, statistical HMMs over phonetic models, CNNs over SIFT features.

Why It Still Matters

Great fit if you want the intellectual backbone of the "scaling hypothesis" — this ~700-word note is the root of debates that now shape LLM strategy and compute budgets. Look elsewhere if you want balance or rebuttal: it is a deliberately provocative thesis, and critics (notably Rodney Brooks' "A Better Lesson") argue it understates the data, architecture, and human design still baked into every supposedly "general" method.

The Bitter Lesson

Introduction

Core Argument

Why It Still Matters

Information

Categories

Tags

More Items

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Cosmos 3: Omnimodal World Models for Physical AI

Video models are zero-shot learners and reasoners