Every few years the same drama replays: a team hand-codes deep domain knowledge, wins the early benchmark, then gets buried by a cruder method that simply scales with compute. Rich Sutton's essay names that pattern and explains why it keeps catching researchers off guard — the win is "bitter" precisely because it beats the human-centric approach we are emotionally invested in.
Core Argument
Sutton reads 70 years of AI as one repeated experiment:
- Search and learning are the only methods that scale arbitrarily with computation — everything else plateaus once compute grows.
- Building in human knowledge pays off short-term, then caps progress — it complicates methods in ways that resist general scaling.
- The engine is Moore's law: researchers optimize as if compute were fixed, but it inevitably explodes, rewarding whoever bet on general methods.
- The evidence is a string of reversals — deep search over chess heuristics, AlphaGo's self-play over hand-built Go knowledge, statistical HMMs over phonetic models, CNNs over SIFT features.
Why It Still Matters
Great fit if you want the intellectual backbone of the "scaling hypothesis" — this ~700-word note is the root of debates that now shape LLM strategy and compute budgets. Look elsewhere if you want balance or rebuttal: it is a deliberately provocative thesis, and critics (notably Rodney Brooks' "A Better Lesson") argue it understates the data, architecture, and human design still baked into every supposedly "general" method.
