This paper introduces the generalized delta rule, a learning procedure for multi-layer networks with hidden units, enabling them to learn internal representations. This rule implements a gradient descent method to minimize the error between the network's output and a target output by propagating error signals backward through the network. The authors demonstrate through simulations on various problems, such as XOR and parity, that this method, often called backpropagation, can discover complex internal representations and solutions. They show it overcomes previous limitations in training such networks and rarely encounters debilitating local minima.
We now have a rather good understanding of simple two-layer associ-ative networks in which a set of input patterns arriving at an input layer are mapped directly to a set of output patterns at an output layer. Such networks have no hidden units. They involve only input and output units. In these cases there is no internal representation. The coding provided by the external world must suffice. These networks have proved useful in a wide variety of applications (cf. Chapters 2, 17, and 18). Perhaps the essential character of such networks is that they map similar input patterns to similar output patterns. This is what allows these networks to make reasonable generalizations and perform reasonably on patterns that have never before been presented. The similarity of patterns in a PDP system is determined by their overlap. The overlap in such networks is determined outside the learning system itself–by whatever produces …