This series looks at software development life cycles from the inception of Waterfall to HCDAgile practiced at 1904labs.

Originally published at http://niarcas.com

It all started with a big bang

In August 1949 the Soviet Union tested its first atomic bomb. Now that a single plane from the Soviet Union could drop a nuclear bomb on the U.S., the U.S. needed an early warning system.

There were four guidelines for the early warning system:

Little to no downtime, as downtime meant you were blind to the skies
Extremely reliable as false alerts would be as catastrophic as real alerts
Centralized command to prevent renegade attacks against the Soviet Union
Information needed to be reported in real time so the response was immediate

This meant that the manual process that relied on humans was a non-starter and the only option was a computerized system. Thus the idea for SAGE was born, the computer-based air defense system.

At the core was two computers which shared data, so if one computer went down the other would still be operational. Those two computers received data from radar stations all over the U.S. and had to calculate tracking data for reported enemy aircraft in real time.

At it’s peak 800 programmers were writing around a quarter million lines of machine language code to coordinate all the different activities the computer needed to perform.

How do you manage that level of communication complexity in a nascent field where errors are a matter of life and death?

Managing complexity

In 1956 Herbert D. Benington presented a paper at the Symposium on Advanced Programming Methods for Digital computers describing a model used at MIT’s Lincoln Laboratory to produce software in a very structured manner.

Benington reasoned that the current decentralized model to build software didn’t work because no previous software had a real-time reporting requirement.

In the decentralized model, the output of one program would be used as the input to other programs with humans acting as the intermediaries, physically moving data stored on magnetic tape or punch cards. You could say the decentralized model were the earliest examples of sneakernet.

Plan or Die

Today, society puts innate trust in computer systems but in the 1950s computers didn’t have such privilege. The norm was for computers to be down for weeks or months, so having a computer with 100% up-time seemed like mission: impossible.

With the cost of failure so high, everything had to be planned out given the complexity of the system. Information from roughly 200 different radar systems would need to be computed in real time.

Herbert created this workflow to show the gated process for when specifications would be written, the system would be coded and testing would occur.

Plan->Code->Test->Release

The operational plan defines the broad design requirements of the system. This is used as the basis for the machine specification and operational specification. The operational specification is from the user’s point of view and treats the entire system as a black box with the only concern being how the user interacts with the system.

The program and machine specifications open the black box and describe the sub-programs, data storage and program intercommunication requirements. These specifications specify time and storage sharing for the sub-programs on the main computer.

The coding specification takes the program and machine specifications and adds more detail needed to actually code each sub-program.

After the sub-program is coded, it’s separately tested on the system with simulated data representing other sub-programs.

The sub-program is then connected to the main system, with the other running sub-programs, gradually being tested at each integration step and then one final “shake down” is given.

The sub-program is then ready for operation and evaluation.

As you can see there are many tests each sub-program had to pass in-order to finally get integrated into the system. This made sure the system as a whole was predictable and reliable.

Documentation: an important by-product of software

In addition to testing, Benington noted that “documentation of the system program is an immense, expensive job” and even simple changes have ripple effects, as many different documents may need to have their changes coordinated.

These are problems that continue to plague waterfall and actually all software projects as documentation is often an after-thought. Something we’ll revisit again in future posts.

The process gets a name

The waterfall method solidified more in the 1970s through a paper by Dr. Winston Royce where he described his personal views about managing large software projects and gave the first example of the process we’re more familiar with today.

The process was given even more credence and the name “Waterfall” was applied to this method in the paper Software requirements: Are they really a problem? by T.E. Bell and T.A. Thayer in 1976.

Software development goes mainstream

The waterfall method was doctrine for developing software in the 20th century. It works well in a world of clearly defined stable goals. Since development moves through defined stages it’s easy to communicate across a large enterprise which stage everyone is in.

If you include video games as software projects, there have been thousands of successful products released through the waterfall method. There have also been massive failures using the waterfall method.

Failures using the Waterfall Model

Probably, the biggest early failure was the E.T. video game for the Atari 2600 released in 1982. This failure was so bad it’s cited as a key reason for video game recession of 1983 that saw industry revenues fall 97%.

This game was so disastrous that it created an urban legend that Atari buried hundreds of cartridges of the E.T. game in Alamogordo, New Mexico. Except this legend was discovered to be true in 2014 and filmed in the documentary Atari: Game Over.

While the problem affecting the E.T. game had some unique elements it’s not alone. In 1993 Greyhound had a revenue loss of $61 million due to their bus reservation system, called “Trips”, repeatedly crashing upon introduction.

A system expected to save the company at a time when the company was just emerging out of Chapter 11.

There are many more examples of spectacular failures with the waterfall process but let’s talk about why these projects fail.

According to Robert Charette they are due to a combination of poor technical and business decisions.

Here’s a list of common factors he notes lead to failure:

Unrealistic or unarticulated project goals
Inaccurate estimates of needed resources
Badly defined system requirements
Poor reporting of the project’s status
Unmanaged risks
Poor communication among customers, developers, and users
Use of immature technology
Inability to handle the project’s complexity
Sloppy development practices
Poor project management
Stakeholder politics
Commercial pressures

Personally, I believe poor communication is the at the heart of most project failures. People just want to bury the bad news or hope everything magically starts to get better. From the development side there’s also the fear of conflict of saying no and there’s the ego side of wanting to be the hero who saves the project at the 11th hour.

However, there are times when the consequences of implementation details may lead to problems only caught later in the development cycle.

If we apply the problems of the waterfall process to building a house, if you find out the sewer lines aren’t properly connected early you can easily remedy the situation.

However, if the problem goes undetected and people move in just to find strong sewage smells and ever increasing wall cracks, then you’ll need to get people out of the house, remove the furniture and tear down walls.

In addition to the all out failures there were some other problems that became apparent with waterfall :

Working software isn’t produced until late in the lifecycle
It doesn’t easily allow for new requirements or scope adjustments
People are bad at estimating large projects
Technical or business bottlenecks aren’t identified early

While these problems were painful, they weren’t deal breakers as long as you could keep your requirements solid and your communication high.

Building software was similar to building any other physical object, especially given that software was packaged and sold on physical media, first as floppy disks and then as CD-ROMs.

Things take a long time to build and software was no different. Windows 95 started in March 1992 and didn’t release until August 25, 1995 ushering in the personal computer era.

The Internet Arrives: Move Fast and Break Things

While competition is always fierce and moving fast is always an advantage over your competitors, it was never the deciding factor in gaining market share. In fact, Microsoft was late to the browser wars but was able to use it’s OS dominance to crush Netscape Navigator.

I see little commercial potential for the Internet for at least 10 years.
Bill Gates, 1994

That all changed with the Internet arriving and and growing at 2,300 percent per year.

Now, speed mattered. Now, speed was a key feature. Now, waterfall was in trouble.

In the next post of the series I’ll talk about how various software process experiments by different people led to a sort-of “meeting of the families” in the Wasatch mountains of Utah and came out with the software industry’s version of Tablets of Stone, the Agile Manifesto.