AIAny - 30u30

Keeping NN Simple by Minimizing the Description Legnth of the Weights

1993

Geoffrey E. Hinton, Drew van Camp

This paper proposes minimizing the information content in neural network weights to enhance generalization, particularly when training data is scarce. It introduces a method where adaptable Gaussian noise is added to the weights, balancing the expected squared error against the amount of information the weights contain. Leveraging the Minimum Description Length (MDL) principle and a "bits back" argument for communicating these noisy weights, the approach enables efficient derivative computations, especially if output units are linear. The paper also explores using adaptive mixtures of Gaussians for more flexible prior distributions for weight coding. Preliminary results indicated a slight improvement over simple weight-decay on a high-dimensional task.

foundation 30u30 paper

A Tutorial Introduction to the Minimum Description Length Principle

2004

Peter Grunwald

This paper gives a concise tutorial on MDL, unifying its intuitive and formal foundations and inspiring widespread use of MDL in statistics and machine learning.

foundation 30u30 paper math

Machine Super Intelligence by Shane Legg

2011

Shane Legg

This book develops a formal theory of intelligence, defining it as an agent’s capacity to achieve goals across computable environments and grounding the concept in Kolmogorov complexity, Solomonoff induction and Hutter’s AIXI framework.It shows how these idealised constructs unify prediction, compression and reinforcement learning, yielding a universal intelligence measure while exposing the impracticality of truly optimal agents due to incomputable demands. Finally, it explores how approximate implementations could trigger an intelligence explosion and stresses the profound ethical and existential stakes posed by machines that surpass human capability.

foundation 30u30 book

The First Law of Complexodynamics

2011

Scott Aaronson

This post explores why physical systems’ “complexity” rises, peaks, then falls over time, unlike entropy, which always increases. Using Kolmogorov complexity and the notion of “sophistication,” the author proposes a formal way to capture this pattern, introducing the idea of “complextropy” — a complexity measure that’s low in both highly ordered and fully random states but peaks during intermediate, evolving phases. He suggests using computational resource bounds to make the measure meaningful and proposes both theoretical and empirical (e.g., using file compression) approaches to test this idea, acknowledging it as an open problem.

foundation blog 30u30 tutorial

ImageNet Classification with Deep Convolutional Neural Networks

2012

Alex Krizhevsky, Ilya Sutskever +1

The 2012 paper “ImageNet Classification with Deep Convolutional Neural Networks” by Krizhevsky, Sutskever, and Hinton introduced AlexNet, a deep CNN that dramatically improved image classification accuracy on ImageNet, halving the top-5 error rate from \~26% to \~15%. Its innovations — like ReLU activations, dropout, GPU training, and data augmentation — sparked the deep learning revolution, laying the foundation for modern computer vision and advancing AI across industries.

vision 30u30 paper foundation

Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

2014

Scott Aaronson, Sean M. Carroll +1

This paper proposes a quantitative framework for the rise-and-fall trajectory of complexity in closed systems, showing that a coffee-and-cream cellular automaton exhibits a bell-curve of apparent complexity when particles interact, thereby linking information theory with thermodynamics and self-organization.

foundation 30u30 paper physics science

Neural Machine Translation by Jointly Learning to Align and Translate

2014

Dzmitry Bahdanau, Kyunghyun Cho +1

This paper introduces an attention-based encoder–decoder NMT architecture that learns soft alignments between source and target words while translating, eliminating the fixed-length bottleneck of earlier seq2seq models. The approach substantially improves BLEU, especially on long sentences, and matches phrase-based SMT on English-French without additional hand-engineered features. The attention mechanism it proposes became the foundation for virtually all subsequent NMT systems and inspired attention-centric models like the Transformer, reshaping machine translation and sequence modeling across NLP.

30u30 paper NLP translation

Recurrent Neural Network Regularization

2014

Wojciech Zaremba, Ilya Sutskever +1

This paper presents a method for applying dropout regularization to LSTMs by restricting it to non-recurrent connections, solving prior issues with overfitting in recurrent networks. It significantly improves generalization across diverse tasks including language modeling, speech recognition, machine translation, and image captioning. The technique allows larger RNNs to be effectively trained without compromising their ability to memorize long-term dependencies. This work helped establish dropout as a viable regularization strategy for RNNs and influenced widespread adoption in sequence modeling applications.

foundation 30u30 paper

Neural Turing Machines

2014

Alex Graves, Greg Wayne +1

This paper augments recurrent neural networks with a differentiable external memory addressed by content and location attention. Trained end-to-end, it learns algorithmic tasks like copying, sorting and associative recall from examples, proving that neural nets can induce simple programs. The idea sparked extensive work on memory-augmented models, differentiable computers, neural program synthesis and modern attention mechanisms.

foundation 30u30 paper

CS231n: Deep Learning for Computer Vision

2015

Fei-Fei Li

Stanford’s 10-week CS231n dives from first principles to state-of-the-art vision research, starting with image-classification basics, loss functions and optimization, then building from fully-connected nets to modern CNNs, residual and vision-transformer architectures. Lectures span training tricks, regularization, visualization, transfer learning, detection, segmentation, video, 3-D and generative models. Three hands-on PyTorch assignments guide students from k-NN/SVM through deep CNNs and network visualization, and a capstone project lets teams train large-scale models on a vision task of their choice, graduating with the skills to design, debug and deploy real-world deep-learning pipelines.

foundation vision 30u30 course tutorial

The Unreasonable Effectiveness of Recurrent Neural Networks

2015

Andrej Karpathy

This tutorial explores the surprising capabilities of Recurrent Neural Networks (RNNs), particularly in generating coherent text character by character. It delves into how RNNs, especially when implemented with Long Short-Term Memory (LSTM) units, can learn complex patterns and structures in data, enabling them to produce outputs that mimic the style and syntax of the training material. The discussion includes the architecture of RNNs, their ability to handle sequences of varying lengths, and the challenges associated with training them, such as the vanishing gradient problem. Through various examples, the tutorial illustrates the potential of RNNs in tasks like language modeling and sequence prediction.

30u30 foundation blog tutorial

Understanding LSTM Networks

2015

Christopher Olah

This tutorial explains how Long Short-Term Memory (LSTM) networks address the limitations of traditional Recurrent Neural Networks (RNNs), particularly their difficulty in learning long-term dependencies due to issues like vanishing gradients. LSTMs introduce a cell state that acts as a conveyor belt, allowing information to flow unchanged, and utilize gates (input, forget, and output) to regulate the addition, removal, and output of information. This architecture enables LSTMs to effectively capture and maintain long-term dependencies in sequential data

foundation blog 30u30 tutorial

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ocr

ollama

openai

paper

physics

plugin

pytorch

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

Keeping NN Simple by Minimizing the Description Legnth of the Weights