Chain rule

Early modern · Household · 1676

TL;DR

Leibniz's 1676 method for differentiating composite functions became the mathematical foundation of backpropagation in modern neural network training.

The chain rule emerged in 1676 from Gottfried Wilhelm Leibniz's development of calculus, providing the essential technique for differentiating composite functions—functions built from other functions. This seemingly abstract mathematical tool would eventually become the computational engine behind modern machine learning, demonstrating how fundamental mathematical insights can wait centuries for their most powerful applications.

The adjacent possible for the chain rule was calculus itself. Leibniz's new mathematics of infinitesimals provided the conceptual framework for analyzing how functions change. Once one could differentiate simple functions, the question naturally arose: how does one differentiate a function that depends on another function? If y depends on u, and u depends on x, how does y ultimately change with x?

Leibniz's answer, first mentioned in a 1676 memoir, was elegantly expressed in his notation: dy/dx = dy/du · du/dx. The derivatives "chain" together multiplicatively. If you know how y changes with u, and how u changes with x, you can compute how y changes with x by multiplying these rates. The notation itself suggested the rule—the du terms appear to "cancel" symbolically, though this is more mnemonic than rigorous justification.

The geographical pattern of calculus's development concentrated in a few European centers. Leibniz worked in Hanover and corresponded extensively with mathematicians across the continent. Isaac Newton developed his own version of calculus in England, though with different notation and somewhat different emphasis. The priority dispute between Newton and Leibniz's supporters consumed considerable mathematical energy, but both traditions contributed to the chain rule's elaboration and application.

For three centuries, the chain rule remained a standard tool of mathematical analysis, essential for physics, engineering, and any field requiring calculus but not particularly celebrated. Engineers used it to analyze mechanical systems; physicists applied it to thermodynamics and mechanics; mathematicians proved theorems about differentiable functions. The rule was powerful, fundamental, and unremarkable.

The transformation came with the development of artificial neural networks. When researchers sought algorithms to train multi-layer networks, they discovered that the chain rule provided exactly what they needed. Backpropagation—the algorithm that computes how to adjust network weights to reduce errors—is essentially systematic application of the chain rule through layers of mathematical operations. Each layer of a neural network computes a function of the previous layer; the chain rule propagates error gradients backwards through all these compositions.

This application was not merely useful but essential. Without the chain rule, modern deep learning would be computationally infeasible. Networks with millions of parameters can be trained because the chain rule allows gradients to be computed efficiently layer by layer rather than requiring separate calculation for each parameter. The same mathematical insight that Leibniz articulated in 1676 to handle composite functions now enables systems that recognize faces, translate languages, and generate text.

By 2026, the chain rule has become perhaps the most economically consequential piece of seventeenth-century mathematics. Leibniz could not have anticipated neural networks, but his notation and method created the computational framework that machine learning required. The adjacent possible that Leibniz opened in 1676 included possibilities—the training of artificial intelligence systems—that would not be explored for more than three hundred years.

What Had To Exist First

Preceding Inventions

Required Knowledge

  • differential-calculus
  • function-composition
  • mathematical-analysis

Enabling Materials

  • mathematical-notation

What This Enabled

Inventions that became possible because of Chain rule:

Independent Emergence

Evidence of inevitability—this invention emerged independently in multiple locations:

Hanover

Parallel development

Biological Patterns

Mechanisms that explain how this invention emerged and spread:

Related Inventions

Tags