Speech recognition software

Digital · Computation · 1990

TL;DR

Speech recognition emerged in 1990 when hidden Markov models met sufficient computing power—Dragon Dictate at $9,000 proved the market, while Moore's Law eventually delivered $150 continuous speech.

Speech recognition emerged from the collision of statistical mathematics and sufficient computing power—a collision that took three decades to complete. In 1990, Dragon Systems launched Dragon Dictate, the first consumer speech recognition product. It cost $9,000, recognized 30-40 words per minute (humans speak at 150), required users to pause after each word, and demanded hours of "training" the software to individual voices. It worked well enough to find customers in healthcare and law, where the value of hands-free dictation justified the cost and friction.

The mathematical foundation had been laid decades earlier. Leonard Baum developed the mathematics of Markov chains at the Institute for Defense Analysis in the late 1960s. A decade later, James Baker and Janet Baker at Carnegie Mellon applied hidden Markov models (HMMs) to speech—representing language as sequences of probable states rather than fixed patterns. This shift from template matching to statistical inference transformed accuracy. By the mid-1980s, Fred Jelinek's team at IBM had created Tangora, a voice-activated typewriter with a 20,000-word vocabulary.

But vocabulary was not the barrier—continuous speech was. Early systems took discrete dictation because processing the acoustic signal and language model simultaneously in real time exceeded available computing power. Users had to pause between words, creating an unnatural speaking rhythm that few could sustain. Ray Kurzweil released speech recognition software in 1984; IBM and Kurzweil Applied Intelligence competed throughout the 1980s. The technology remained expensive, frustrating, and constrained to specialists.

Dragon's breakthrough in 1997 with Dragon NaturallySpeaking changed the economics dramatically. At $150, the software could capture words at conversational speed. Moore's Law had finally delivered processors fast enough to handle continuous speech in real time. The installed base shifted from medical transcription to ordinary consumers.

The convergent evolution continued. IBM, Dragon, and Kurzweil Applied Intelligence pursued parallel paths with similar methods—HMMs combined with language models—reaching similar capabilities within years of each other. Lernout & Hauspie acquired both Kurzweil Applied Intelligence (1997) and Dragon Systems (2000), consolidating the expertise that would later enable Siri, Alexa, and Google Assistant. The mathematics discovered for defense applications became the foundation for virtual assistants two generations later.

What Had To Exist First

Required Knowledge

  • hidden-markov-models
  • statistical-language-modeling
  • acoustic-phonetics
  • signal-processing

Enabling Materials

  • fast-processors
  • digital-signal-processors
  • memory-chips

What This Enabled

Inventions that became possible because of Speech recognition software:

Biological Patterns

Mechanisms that explain how this invention emerged and spread:

Commercialized By

Related Inventions

Tags