Daily Tech Feed: From the Labs

Deep dives into foundational AI and ML research papers

12: Kolmogorov Complexity — Sunday Greatest Hits

The only full textbook on Ilya Sutskever's famous reading list. Why did a deep learning pioneer tell John Carmack to study algorithmic randomness? Because compression is intelligence — and this book is the mathematical foundation for that claim. We cover Kolmo...

Show Notes

Episode 0012: Kolmogorov Complexity and Algorithmic Randomness

Series: Greatest Hits — Ilya Sutskever's Reading List Date: Sunday, February 22, 2026

Overview

The only full textbook on Ilya Sutskever's famous ~30 item reading list. Why did a deep learning pioneer tell John Carmack to study algorithmic randomness? Because compression is intelligence — and this book is the mathematical foundation for that claim.

We cover the key concepts: Kolmogorov complexity, the invariance theorem, incompressibility, algorithmic randomness, Berry's paradox, connections to Gödel and Turing, and why all of this matters for understanding why neural networks generalize.

The Textbook

Ilya Sutskever's Reading List

Key Concepts Discussed

Kolmogorov Complexity

  • The length of the shortest program that produces a given output
  • Independently discovered by Ray Solomonoff (1960), Andrei Kolmogorov (1965), and Gregory Chaitin (1966)
  • Wikipedia: Kolmogorov complexity

The Invariance Theorem

  • Kolmogorov complexity is independent of programming language, up to an additive constant
  • Makes the definition well-defined as a property of the string itself

Algorithmic Randomness & Martin-Löf

Berry's Paradox

  • "The smallest positive integer not definable in fewer than twenty words"
  • Formalized via Kolmogorov complexity to prove uncomputability
  • Wikipedia: Berry paradox

Connections to Gödel and Turing

  • Kolmogorov complexity is uncomputable (reduces to halting problem)
  • Chaitin's Omega: halting probability with algorithmically random digits
  • Incompleteness, halting, and randomness as three faces of the same limit
  • Wikipedia: Chaitin's constant

Why This Matters for AI

Compression ↔ Prediction (Shannon Duality)

  • Good predictors are good compressors and vice versa
  • Neural networks trained to predict are implicitly compressing
  • Shannon's 1948 paper

Minimum Description Length (MDL)

  • Also on Ilya's reading list
  • Best model = shortest total description (model + data given model)
  • Occam's razor made mathematical
  • Wikipedia: Minimum description length
  • Jorma Rissanen's original work (1978)

Solomonoff Induction

Deep Learning Generalization

The Authors

  • Alexander Shen — LIRMM, CNRS & Université de Montpellier, France. Homepage
  • Vladimir A. Uspensky (1930–2018) — Lomonosov Moscow State University. Pioneer in Russian mathematical logic. Wikipedia
  • Nikolai Vereshchagin — Lomonosov Moscow State University. Google Scholar

Further Reading


This episode was researched and written with AI assistance.