Why Are We Putting Errors into Our Design Early in the Lifecycle?

“Information systems projects frequently fail. Depending upon which academic study you read, the failure rate of large projects is reported as being between 50%-80%. Because of the natural human tendency to hide bad news, the real statistic may be even higher. This is a catastrophe. As an industry we are failing at our jobs.”

Opening paragraph from “Top 10 Reasons Why Systems Projects Fail,” by Dr. Paul Dorsey, Dulcian, Inc.[1]

Although this paper focuses on software design and development, it’s true for all design and development. Dr. Dorsey goes on to list the 10 reasons, which we can all recognize as common failures of proper program planning. Lack of methodology is his number one reason. In systems engineering, we use a variety of methodologies, often based on a specific technique (IDEF, SysML, BPMN, etc.). Most of these techniques are actually drawing specifications – i.e. ways to draw diagrams consistently. But is just drawing enough? Are we putting errors in the drawings that are not detected until much later in the lifecycle?

For example, if we introduce an error during the early concept design phase, when do we test for meeting those requirements? The answer: at the end of the lifecycle, usually in operational test and evaluation, or worse, after the product has been fielded. The result: major systems failures, often resulting in loss of life along with massive loss of property, company reputation and other negative consequences.

Prime example of systems failures can be seen every day. Some of the big headline grabbers were:[2]

  • Piper Alpha was once Britain’s biggest single oil and gas producing platform, bringing more than 300,000 barrels of crude a day – 10% of the country’s total – from below the seabed 125 miles north-east of Aberdeen. The Piper Alpha disaster which killed 167 workers on 6 July 1988 off the coast of Aberdeen is the world’s deadliest ever oil rig accident. (Note: Many more died in this accident than Deepwater Horizon where 11 BP workers died).
  • The AT&T network collapse (1990) – In 1990, 75 million phone calls across the US went unanswered after a single switch at one of AT&T’s 114 switching centers suffered a minor mechanical problem, which shut down the center
  • DC Metro Red Line Accident – In June 2009, a DC Metro Red Line subway train traveling at a high rate of speed rear-ended a stationary subway train near Ft. Totten station. Nine people, including the train operator, were killed. Some 52 people were sent to local hospitals. Damage to train equipment was estimated to be $12 million.

Many of these accidents are tracked down to a specific problem, such as the “O” ring in the Space Shuttle Challenger disaster, but all are really due to systemic problems. The “O” ring failure was a failure of the system to not detect the problem before the disaster occurred. Millions of dollars were spent to determine the exact cause and then the system was “patched” to fix it. The same system failed later with the space shuttle Challenger disaster, which had a different specific cause, but the system did not detect it either.

This is not a knock-on NASA. They do much better than most organizations in design, test and all the other activities necessary to make a very difficult job even possible. NASA has demonstrated how resourceful and responsive they can be, but have been limited by the existing methodologies or the methodologies of the timeframe these incidents have occurred.

The systems engineering methodologies of those times are still in use today in many organization and industries. We need to do better. We need methodologies, of which techniques/languages are more advanced to help identify errors much earlier. Such a systems engineering language now exists: The Lifecycle Modeling Language (LML).[3]

LML focuses first on an actual language (ontology) for use in systems engineering across the entire lifecycle, which includes requirements, modeling, verification, validation, operations, support, and program management. Then it provides a more elegant set of visualizations (diagrams, tables, etc.) as a means to communicate the information to stakeholders. In fact, it encourages the use of common forms of visualization, such as a risk matrix, to make sure anyone can quickly and easily understand the information. It also needs to be able to be translated into other “languages,” such as IDEF, SysML, DM2, etc.

But a language is not enough. We need tools to test the design in the earliest phases. These tools must include both discrete event and Monte Carlo simulation capabilities to ensure that the models execute (i.e., work!). They must also have quality checkers to makes sure that the requirements being created meet the criteria for a good requirement and that the models are mature enough to move to the next phase of the lifecycle. They must be collaborative, since we now work in teams across the world. We actually would like to have all these capabilities in a single tool to avoid the problem of moving data between tools, where errors can occur as well.

Fortunately, just as LML is a quantum leap from SysML, so it a tool that implements it: Innoslate®. Innoslate has all these features and more. It’s user friendly and available to students free, only limited to the number of entities available for a project.

So, as systems engineers, we have a duty to do a better job, as we are the ones introducing the errors into the early phases of the design. We don’t mean to … we work hard not to, but ultimately, it’s our responsibility to avoid these errors. We need to adopt the new languages and tools to do a better job. Lives and livelihoods depend on u

[1]  From https://www.hks.harvard.edu/m-rcbg/ethiopia/Publications/Top%2010%20Reasons%20Why%20Systems%20Projects%20Fail.pdf accessed 6/16/16

[2] Derived from Dr. Peggy Brouse, GMU Course on Systems Engineering Fundamentals, Private Communications

[3]  See www.lifecyclemodeling.org for the specification.