Biology of Business

Protein sequencing

Modern · Medicine · 1955

TL;DR

Protein sequencing became real when Frederick Sanger solved insulin's amino-acid order in 1955, proving that proteins are exact linear sequences and giving molecular biology a direct bridge from chemistry to genetic information.

Before 1955, biochemists could purify proteins, weigh them, crystallize some of them, and argue endlessly about their structure, but they could not read them. Many still treated proteins as chemically fuzzy aggregates rather than precise linear molecules. Protein sequencing changed that by turning a protein from a physiological effect into a text that could be deciphered residue by residue.

Frederick Sanger's route into the problem ran through insulin. The hormone had already become medically crucial after its 1920s isolation, which made it an unusually attractive target: small enough to be tractable, important enough to justify years of work, and chemically stable enough to survive repeated analysis. At the Medical Research Council unit in Cambridge, Sanger took a protein that medicine had made famous and asked a question that molecular biology still could not answer cleanly: did a protein possess one exact amino-acid order, or were proteins statistical mixtures assembled on the fly?

The adjacent possible opened because several techniques had matured at once. Chromatography had made it possible to separate amino acids and small peptides with far more precision than earlier chemistry allowed. X-ray-crystallography had already persuaded researchers that biological macromolecules could have exact structure, even if their sequences remained unreadable. And Sanger's own chemical innovation, fluorodinitrobenzene, gave chemists a way to tag terminal amino acids and identify the ends of peptide fragments after hydrolysis. Protein sequencing did not appear because one genius stared harder than everyone else. It appeared because biochemical separation, labeling chemistry, and a medically important target had finally converged.

Sanger spent much of the late 1940s and early 1950s cutting insulin into manageable peptide fragments, separating those fragments by chromatography and electrophoresis, and then reconstructing how the pieces fit together. The work was punishingly incremental. Insulin turned out to contain two polypeptide chains linked by disulfide bonds, with 51 amino acids in all. When Sanger published the full sequence in 1955, the achievement was not just a first answer about insulin. It was proof that a protein could have a unique, exact sequence. That single result cut against the idea that proteins were amorphous colloids and gave the emerging central-dogma-of-molecular-biology a firmer chemical footing. If proteins had specific sequences, then genes could plausibly specify those sequences.

Convergent evolution quickly followed. In Sweden, Pehr Edman developed a different route to the same goal, releasing amino acids one at a time from the end of a peptide instead of relying on Sanger's fragment-reassembly strategy. The methods were not identical, but their near-overlap matters. Once peptide chemistry, separation methods, and molecular biology questions had all ripened, multiple researchers began moving toward readable proteins. That is usually the sign that an adjacent possible has opened for good.

Path dependence shaped what came next. Sanger himself moved from proteins to nucleic acids and later developed the chain-termination strategy that made DNA-sequencing practical in 1977. The conceptual move was the same one protein sequencing had made respectable: large biological molecules were not inscrutable goo, but ordered information that could be broken into interpretable fragments and read systematically. Once that assumption took hold, molecular biology stopped asking only what molecules did and began asking what messages they encoded.

The trophic cascades ran through the whole field. Protein sequencing helped anchor the central dogma by giving researchers concrete products of genetic information to compare across species, tissues, and mutations. It enabled later work on enzyme defects, antibodies, peptide hormones, and recombinant therapeutics because scientists could now define a protein by sequence rather than only by behavior. It also created a standard of exactness that biochemistry never gave up. Modern mass spectrometry and automated sequencers are descendants of that demand.

Protein sequencing never became a mass-market consumer technology, yet it quietly changed the ontology of life science. After Sanger, a protein was no longer merely something extracted from tissue or inferred from function. It was an ordered chain that could be read, compared, altered, and eventually engineered. That change in what biologists believed a protein to be is why the invention mattered so much.

What Had To Exist First

Required Knowledge

  • Amino-acid chemistry
  • Peptide bond cleavage and fragment mapping
  • Separation science for small biomolecules
  • Disulfide-bond analysis in proteins

Enabling Materials

  • Purified insulin preparations
  • Fluorodinitrobenzene labeling chemistry
  • Paper chromatography and electrophoresis media
  • Hydrolysis and peptide-fragment handling apparatus

What This Enabled

Inventions that became possible because of Protein sequencing:

Independent Emergence

Evidence of inevitability—this invention emerged independently in multiple locations:

Sweden 1949

Edman independently developed stepwise peptide sequencing chemistry that attacked the same protein-reading problem by a different route

Biological Patterns

Mechanisms that explain how this invention emerged and spread:

Related Inventions

Tags