Let’s work through a few hypothetical “what if” scenarios to illustrate some common engineering themes related to quality control and the inherent stresses between those who build, those who test, and those who sell. Every engineer is deeply familiar with these patterns, but I believe even the general reader will understand the dynamics better by reading these scenarios.
Let us start by imagining that a new bridge is being built in your area. The company that is building the bridge is very eager to have it open by a particular date. In fact, their contract calls for monetary penalties for every day the opening is delayed beyond that date. However, before it can be opened to traffic, it must be inspected to ensure that the welds conform to the applicable standard. For sake of argument let’s say the standard is the AASHTO/AWS D1.5M/D1.5:2002 Bridge Welding Code.
The inspectors may inspect all of the welds and find that they are all acceptable. What do you you think of this, as someone who will soon ride over that bridge? Is this good news? Yes, if you trust the expertise and independence of the inspectors, and their testing process and equipment. If the inspectors do their job properly, and they find no defects, then this indeed is cause for celebration.
But what if the inspectors found a handful of defects, perhaps some welds that failed fatigue testing? If indeed the defects are few, and are localized, then they can be fixed and retested and we can still open the bridge on time. But it is critical that the changes are localized, that there are no far reaching changes. A bridge is not just a collection of independent pieces of metal. They all work together, and as a whole have static and dynamic mechanical properties and relate to load capacity, stresses, thermal characteristics, resonance, etc. Although some fixes may be only localized in their impact, meaning only the area changed needs to be retested, other fixes may have a larger impact and require that everything be retested.
In any complex system, some defects are expected. A sign of good of engineering process is that larger, structural defects are detected or prevented at the earliest possible moment, when they are easiest and least expensive to fix. Where this is not accomplished, large design defects may be first detected at final inspection time, and costly and pervasive rework and retesting may be required, or in the extreme, the bridge may need to be torn down.
The engineering maxim is “fail early”. Now this may seem like an odd thing to say. Shouldn’t we always try to prevent failure or at least delay it as long as possible? Certainly, if you can prevent failure, then do so. But it is rarely the case where all defects can be prevented. But as engineers, we can design systems, and testing procedures so that flaws become evident as early in the process as possible, when they can be fixed in architecture and design documents rather than in built structures, or at least be found as early in the construction process as possible. This is a frequent source of stress between those who build and those who sell. The important thing for all to understand is that failing early is actually a form of risk reduction. The sooner you fail, the sooner you can fix the defect and start again.
Back to the analogy.
Let’s build another bridge. Along comes MegaCorp, who wants to build a bigger bridge, a much bigger bridge than any attempted previously, a MegaBridge. There is nothing wrong with that per se. The history of engineering is the history of making bigger pyramids, wider vaulted ceilings, taller skyscrapers and longer bridges.
Of course, the fact that MegaBridge is right down the street from the new bridge that just opened last week is a bit odd. But MegaCorp tells us that is OK. We’re not required to use their bridge if we don’t want to.
Further suppose MegaCorp also wants to construct this MegaBridge in record time, faster than others have constructed bridges even a fraction of their size. This is certainly ambitious, but there is no law against ambition. Progress is made by those who are ambitious. We learn from their successes as well as their failures. The important thing is that an ambitious MegaBridge, like any other bridge, is held to the same standards as any other bridge, that proper inspections are carried out and that quality criteria are satisfied.
Months later and the construction of MegaBridge is complete. Time for inspection. But one problem — the MegaBridge is so large that it is impossible to carry out an inspection in the scheduled time. There are simply not enough inspectors available to carry out the task and complete it by the targeted opening time.
What should we do?
It is useful at this time to consider another engineering maxim, “fail safe“. If a system is overloaded, or detects an error condition, it should fail to a safe state, a state least likely to cause damage. We see this applied in many of the systems we use every day. Traffic lights fail safe to flashing red, GFCI circuits fail safe by switching off current if a ground fault is detected, and train air brakes fail safe by applying the breaks if air pressure is lost.
The concept of a “fail safe” applies to processes as well as mechanical systems. A committee, by having a quorum requirement, ensures that it fails to a harmless, inactive state if a snowstorm prevents a representative portion of the committee from attending a meeting. A criminal trial, by presuming innocence and requiring a unanimous verdict to convict, ensures that in case of deadlock, the defendant is let free. Similarly, a bridge quality inspection protocol should include a fail safe provision, that if the inspection cannot be completed, the bridge should not be certified as fit for use. The inspection process should fail safe to non-certification. Ordinarily, engineering practice would be to take whatever time is necessary to inspect the bridge fully, or fail the inspection.
(Here our tale diverges from standard engineering practice and starts to relay, by analogy, the increasingly bizarre tale of OOXML’s exploits in and of ISO.)
But MegaCorp wants the MegaBridge to open on time. They force the inspection to continue, even though the inspectors claim there is not enough time. In order to “help” the inspection and despite the obvious conflict of interest, MegaCorp instructs a large number of its own employees, qualified or unqualified, to volunteer as bridge inspectors. They further recruit employees from subsidiaries and suppliers to become inspectors as well. In at least one case, MegaCorp tells a supplier, newly-minted as an inspector, “Don’t worry if you know nothing about bridges. We’ll tell you what to say. All you need to do is say that the bridge is safe. You’ll be rewarded later for helping us here.”
So the bridge inspectors go out, old and new, qualified and unqualified and come back with their individual preliminary reports. The older, more experienced inspectors are critical in their evaluation:
The bridge is full of defects. Although, as we mentioned earlier, the mandated schedule did not permit us to test all of the critical welds, of the ones we did test, we found numerous defects. In fact, the number of defects we report is artificially low, since it was limited by our available inspection time. If we had been able to complete a full inspection, we would have detected and reported many more problems.
We further found pervasive structural problems. This bridge is unsound. We can not certify it. We further question why it is necessary to open up a new toll bridge at all, when we just opened up a new free bridge down the street.
The newly-minted inspectors, who for the most part are economically dependent on MegaCorp, were more supportive:
Although some minor problems were indicated, we believe these can all be fixed during routine maintenance. We are not concerned about the time permitted for inspection. We did what the process required. And when you count all the new inspectors that MegaCorp has brought to the process, no bridge has been more inspected. Considering the number of defects reported, this is the most-inspected bridge in history. We recommend that MegaBridge be certified and opened as scheduled.
Of course, from an quality control perspective, this is seriously flawed. The checks and balances between those who build, those who test and those who sell have been eliminated. Although it would not be unusual for some MegaCorp inspectors to be involved in the inspection process, the late arrival of so many unqualified, newly-minted inspectors, and the shift of balance to MegaCorp’s hand-picked inspectors, calls into question the independence and technical sufficiency of the entire inspection process.
The inspectors are polled to see whether the bridge can be certified. The vote is close, but the answer is no, the MegaBridge cannot be certified in its current condition. The inspectors, mainly the older, more experienced ones, record a report of 3,522 specific defects in the MegaBridge, far more defects than have ever been found in any other bridge.
MegaCorp is irate. They blast the experienced inspectors in the press, while simultaneously reassuring their stockholders that this setback is just the next step forward to success. They give their engineers the inspection report and demand a quick response. “We must open the bridge on time!” they yell. The MegaCorp engineers work day and night, over weekends, over the holidays even, in order to develop written proposals to address each of the reported flaws in the bridge.
The inspectors are given the proposals and asked whether they believe the proposals are sufficient to allow the MegaBridge to be certified. Although the newly-minted inspectors are quick to affirm the adequacy of the proposal, the old-timers just shake their heads in disbelief, with one stating to the press:
You could fix every last defect in that report and the MegaBridge would still not be sound. Since we never inspected all of the critical welds in the first place, fixing only the defects we reported is insufficient. It is not enough for us to merely retest the ones we reported as defective. We need to test all of them.
Also, the fact that you are making pervasive changes to the road surface, the suspension materials and the pillar diameters, far-reaching design changes which were clearly rushed and have not gone through normal review procedures, I’m afraid that all of our previous tests are now invalidated as well.
Additionally, many of your proposals either avoid addressing the flaws, paper around the flaws, or even introduce new flaws. We need to re-certify the new design before we can even think about retesting the bridge.
However considering the huge number of defects reported, the even larger number of defects undetected because of lack of inspection time, the questionable competency of the newly-minted inspectors, and overt corruption of the process by MegaCorp, my recommendation would be to tear this thing down before it falls over and hurts someone.
Thus ends the tale of what every engineer knows.