Recurrent Neural Network Regularization

This paper presents a method for applying dropout regularization to LSTMs by restricting it to non-recurrent connections, solving prior issues with overfitting in recurrent networks. It significantly improves generalization across diverse tasks including language modeling, speech recognition, machine translation, and image captioning. The technique allows larger RNNs to be effectively trained without compromising their ability to memorize long-term dependencies. This work helped establish dropout as a viable regularization strategy for RNNs and influenced widespread adoption in sequence modeling applications.

Visit Website

Introduction

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.

Back

Information

Websitearxiv.org
AuthorsWojciech Zaremba, Ilya Sutskever, Oriol Vinyals
Published date2014/09/08

More Items

A Tutorial Introduction to the Minimum Description Length Principle

2004

Peter Grunwald

This paper gives a concise tutorial on MDL, unifying its intuitive and formal foundations and inspiring widespread use of MDL in statistics and machine learning.

foundation 30u30 paper math

Neural Turing Machines

2014

Alex Graves, Greg Wayne +1

This paper augments recurrent neural networks with a differentiable external memory addressed by content and location attention. Trained end-to-end, it learns algorithmic tasks like copying, sorting and associative recall from examples, proving that neural nets can induce simple programs. The idea sparked extensive work on memory-augmented models, differentiable computers, neural program synthesis and modern attention mechanisms.

foundation 30u30 paper

Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

2014

Scott Aaronson, Sean M. Carroll +1

This paper proposes a quantitative framework for the rise-and-fall trajectory of complexity in closed systems, showing that a coffee-and-cream cellular automaton exhibits a bell-curve of apparent complexity when particles interact, thereby linking information theory with thermodynamics and self-organization.

foundation 30u30 paper physics science