On July 20, 1969, the Apollo 11 lunar module Eagle descended toward the Sea of Tranquility with Neil Armstrong and Buzz Aldrin inside. About three minutes before touchdown, with the lunar surface coming up to meet them, the cabin filled with the sound of a program alarm the astronauts had not been particularly prepared for in training. The alarm was 1202. Across the next several minutes it repeated, alongside an additional 1201 alarm, for a total of five separate program alarms during the final descent. The fuel gauge, at the moment of touchdown, was showing a quantity Mission Control was reading as less than 30 seconds of remaining fuel.

The landing succeeded. The story has mostly been absorbed, in the decades since, as a triumph of Armstrong’s piloting and Aldrin’s discipline, with supporting acknowledgments of the Mission Control infrastructure. The acknowledgments are real. They are also missing one of the more substantive contributors to why the landing actually worked. The contributor was a 32-year-old software engineer at MIT’s Instrumentation Laboratory named Margaret Hamilton, who had led the team that built the on-board flight software for the Apollo Guidance Computer, and who had insisted, against institutional pressure, that the software be able to ignore tasks it could not complete.

What the alarms meant

It is worth being precise about what the alarms meant, because they tend to get described in vaguer terms than the underlying mechanism warrants.

The Apollo Guidance Computer running the Eagle’s descent had, by the standards of contemporary computing, almost no processing capacity. It operated with about 64 kilobytes of memory. Its core programs were burned into rope-core memory by hand, with copper wires physically threaded through tiny magnetic cores to encode the ones and zeros. In an interview with CBS Boston, Hamilton described what happened: things were going perfectly in their minds, the team was almost at landing, and all of a sudden the priority alarms came on. 1201 and 1202. She knew right away those alarms came on when there was an emergency, and they had no business going on right then.

The cause, as engineers determined in the post-flight analysis, was that the rendezvous radar system had been left in the wrong configuration during the descent. The radar was sending the guidance computer a continuous stream of data the computer did not need for the landing but could not, in its standard processing model, ignore. The data flooded the available memory. The computer was being asked to handle far more than its capacity could absorb. The 1201 and 1202 codes were the computer’s way of signaling that it had run out of room.

What happened next is the reason the mission succeeded.

What Hamilton’s software did

Hamilton’s software did not treat the overload as a fatal error. It had been built around a principle Hamilton had been advocating across the entire Apollo program: the system should be able to recognize when it could not complete all the tasks being asked of it, and should shed the lower-priority tasks in favor of the higher-priority ones, rather than crashing or producing corrupted output.

This is now standard practice in real-time systems design. It was not, in the mid-1960s when Hamilton was advocating for it. Most programmers had been operating on the assumption that the right approach to a complex system was to ensure that all the tasks the system was asked to perform could be completed within the available resources. Hamilton’s approach was different. She assumed the available resources would, on some occasions, prove inadequate, and that the system should be designed to fail gracefully when that happened.

Discover Magazine’s documentation of how the system operated describes what happened during the descent. The 1201 and 1202 alarms triggered a software reboot. The reboot cancelled all currently running jobs, then restarted them in their table order of priority, quickly enough that no guidance or navigation data was lost. The radar data, being low priority, was shed. The landing computation, being high priority, was preserved. The system continued to function. The reboots happened five separate times during the descent. Each one was, by design, recoverable.

Hamilton’s pushback against the culture

Hamilton’s insistence on this kind of error handling had not been entirely welcome at the time she was advocating for it. The engineering culture of the period had operated on an implicit assumption that astronauts were sufficiently well-trained that they would not make procedural errors, and that the software did not need to be designed to handle errors the astronauts would not make.

Hamilton pushed back against this assumption for some time. Her concern had been informed in part by an incident involving her young daughter, who had been playing with a simulator prototype and had inadvertently produced a sequence of inputs that crashed the system. Smithsonian Magazine’s account of the incident notes that Hamilton recognized her daughter’s mistake was exactly the kind of error an astronaut could make in flight, but when she recommended adjusting the software to address it, she was told: “Astronauts are trained never to make a mistake.” During Apollo 8’s moon-orbiting flight, astronaut Jim Lovell made the exact same error her daughter had. Hamilton’s team was able to correct the problem within hours, and protection was built into the software for all future Apollo flights to make sure it never happened again.

The capability Hamilton had pushed for was eventually built into the system in part because she was the lead developer of the on-board flight software and was positioned to insist on its inclusion. The capability is what saved the landing. The radar configuration error that produced the 1201 and 1202 alarms was exactly the kind of procedural mistake the engineering culture had assumed would not occur. It occurred anyway. The software handled it. The mission proceeded.

What the engineers later said

The recognition of Hamilton’s contribution that the immediate post-flight reporting did not provide has been accumulating gradually in the decades since. Don Eyles, who wrote much of the LUMINARY code that ran the descent, has been quoted as saying the computer was smarter than the engineers had been. Eyles’s own paper, hosted in the Apollo Lunar Surface Journal, documents the technical detail directly: an uncorrected problem in the rendezvous radar interface stole approximately 13 percent of the computer’s duty cycle, resulting in five program alarms and software restarts, and the system handled each one. The framing of the computer being smarter than its engineers is generous, but accurate. The computer was running on Hamilton’s framework. The framework was anticipating failure modes the engineers had not, in their conscious planning, fully accounted for.

Hamilton herself was awarded the Presidential Medal of Freedom in 2016 for her work on the Apollo program. The recognition was, by the standards of how such awards usually arrive, overdue. The iconic photograph of her standing next to a stack of printed Apollo code as tall as she was had, in the intervening decades, become one of the more widely circulated images in the history of software engineering. Hamilton herself coined the term “software engineering” during her work on the Apollo program, in part to legitimize the discipline against the engineering establishment’s tendency to dismiss software as less serious than hardware.

Final words

The Apollo 11 lunar landing succeeded in considerable part because the on-board flight software, designed under the leadership of a 32-year-old engineer at MIT’s Instrumentation Laboratory, had been built to gracefully handle exactly the kind of resource-overload failure that occurred during the descent. The handling was not accidental. It was the result of Hamilton’s insistence that the software be designed for failure modes the engineering culture had assumed would not occur.

The culture had been wrong. The radar configuration error that produced the 1201 and 1202 alarms was exactly the kind of failure the culture had assumed would not happen. It happened. The software handled it. The landing proceeded with what the available fuel readings, at the moment of touchdown, suggested was barely enough fuel to abort.

The recognition of Hamilton’s contribution has been slower than the work would have warranted. The lag is, in part, the condition that women in technical fields have been operating against for longer than the Apollo program itself. Hamilton, on the available evidence of how she conducted the work, has been more interested in the engineering than in the recognition. The engineering is what the landing rested on. The recognition is what the broader culture is still, modestly, in the process of providing.