Thursday, October 26, 2006
The Chernobyl Design Pattern
In 1994, the Intel Pentium chip was found to have a bug in it. In certain cases, it gave the wrong answer for floating-point division. These cases were rare, only 1 in 9 billion divisions, and typically only resulted in errors past the 8th decimal place.
What did Intel do about this? Well, there was denial at first, and then dismissal of the problem as being trivial and unimportant. But eventually they saw the light and offered a no-questions-asked replacement policy for defective processors. No doubt this was expensive for Intel, but this restored their good name and reputation.
It could have been different. For example, they could have simply kept the bug. They could have preserved that bug in future versions of the Pentium for backwards compatibility, arguing that some software may have worked around the original defect, and for them to fix the bug now would only break the workaround. What bug can't be excused by that argument?
Intel could have further decided to turn their bug into a standard, and get it blessed by a standards development organization and maybe even ISO. “It's not a bug, it's a standard”.
But Intel is not Microsoft, so they don't have quite the audacity to turn a bug into a standard, which is what Microsoft is attempting to do by declaring in Office Open XML (OOXML) that the the year 1900 should be treated as a leap year, in contradiction of the Gregorian Calendar which has been in use almost 500 years. (Years divisible by 100 are leap years only if they are also divisible by 400)
By mandating the perpetuation of this bug, we are asking for trouble. Date libraries in modern programming languages like C, C++, Java, Python, Ruby will all calculate dates correctly according to the Gregorian Calendar. So any interpretation of dates in OOXML files in these languages will be off by one day unless the author of the software adds their own workaround to their code to account for Excel's bug. Certainly some will make the “correction” properly, at their own expense. But many will not, perhaps because they did not see it deep within the 6,000 page specification.
There is something I call the “Chernobyl Design Pattern”, where you take your worst bug, the ugliest part of your code, the part that is so bad, so radioactive that no one can touch it without getting killed, and you make it private and inaccessible, and put a new interface around it, essentially entomb it in concrete so that no one can get close to it. In other words, if you can't fix it, at least contain the damage.
Microsoft has taken another approach here. Instead of containment, they are propagating the bug even further. We need to think beyond Excel and think as well of other applications that work with OOXML data, and other applications that work with those apps and so on, the entire network of data dependencies. The mere existence of this bug in a standard will lead to buggy implementations, poor interoperability, and general chaos around dates. The contamination of this bug should have been contained within the source code of Excel. For this fall-out to leak out, into a specification, then a standard and then into other implementations, contradicting both the civil calendar and every other tool that deals with dates, will pollute the entire ecosystem.
This is bad news. Just say no.
What did Intel do about this? Well, there was denial at first, and then dismissal of the problem as being trivial and unimportant. But eventually they saw the light and offered a no-questions-asked replacement policy for defective processors. No doubt this was expensive for Intel, but this restored their good name and reputation.
It could have been different. For example, they could have simply kept the bug. They could have preserved that bug in future versions of the Pentium for backwards compatibility, arguing that some software may have worked around the original defect, and for them to fix the bug now would only break the workaround. What bug can't be excused by that argument?
Intel could have further decided to turn their bug into a standard, and get it blessed by a standards development organization and maybe even ISO. “It's not a bug, it's a standard”.
But Intel is not Microsoft, so they don't have quite the audacity to turn a bug into a standard, which is what Microsoft is attempting to do by declaring in Office Open XML (OOXML) that the the year 1900 should be treated as a leap year, in contradiction of the Gregorian Calendar which has been in use almost 500 years. (Years divisible by 100 are leap years only if they are also divisible by 400)
By mandating the perpetuation of this bug, we are asking for trouble. Date libraries in modern programming languages like C, C++, Java, Python, Ruby will all calculate dates correctly according to the Gregorian Calendar. So any interpretation of dates in OOXML files in these languages will be off by one day unless the author of the software adds their own workaround to their code to account for Excel's bug. Certainly some will make the “correction” properly, at their own expense. But many will not, perhaps because they did not see it deep within the 6,000 page specification.
There is something I call the “Chernobyl Design Pattern”, where you take your worst bug, the ugliest part of your code, the part that is so bad, so radioactive that no one can touch it without getting killed, and you make it private and inaccessible, and put a new interface around it, essentially entomb it in concrete so that no one can get close to it. In other words, if you can't fix it, at least contain the damage.
Microsoft has taken another approach here. Instead of containment, they are propagating the bug even further. We need to think beyond Excel and think as well of other applications that work with OOXML data, and other applications that work with those apps and so on, the entire network of data dependencies. The mere existence of this bug in a standard will lead to buggy implementations, poor interoperability, and general chaos around dates. The contamination of this bug should have been contained within the source code of Excel. For this fall-out to leak out, into a specification, then a standard and then into other implementations, contradicting both the civil calendar and every other tool that deals with dates, will pollute the entire ecosystem.
This is bad news. Just say no.
Labels: OOXML
Comments:
Links to this post:
<< Home
I really like the term "Chernobyl Design Pattern". I've done exactly that quite a few times to hide ugly unmanageable code. It's the perfect name for it and deserves to be in the Jargon File.
Due to problems like this, the standard will not be accepted by the software community and Microsoft will fail again. It is what it deserves for doing such things.
To be fair, this is not a bug in Excell. It was an intentional design decision to ensure compatbaility with Lotus 1-2-3 files.
The goal of this standard is to ensure that an accurate representation of pre-existing documents is possible in a format that non-MS companies can work with. It is not to create an idealized format free of past mistakes.
As for the Intel bug, it wasn't backwards compatable with previous versions.
The goal of this standard is to ensure that an accurate representation of pre-existing documents is possible in a format that non-MS companies can work with. It is not to create an idealized format free of past mistakes.
As for the Intel bug, it wasn't backwards compatable with previous versions.
Jonathan, type =WEEKDAY("1/1/1900") into Excel. What do you get? It returns 1, meaning Sunday. Now look at any reputable calendar created since Pope Gregory XIII. What day of the week was January 1st, 1900? The correct answer is Monday. So yes, Excel has a bug, and yes Microsoft has pushed to include this bug in an International Standard. To say this is not a bug is pure denial.
Why maintain that buggy behavior?
They can simply put a workaround in their new office suit so when loading or storing a document in a older format the date is set correctly for that version.
Maybe the people in charge of that kind of decisions aren't aware of this.
They can simply put a workaround in their new office suit so when loading or storing a document in a older format the date is set correctly for that version.
Maybe the people in charge of that kind of decisions aren't aware of this.
Whatever happened to the wonderful idea of patching broken code ?
And why Microsoft believes that all these legacy artefacts must be included in a 'new' format is beyond me. It ought to be the task of the converter to handle these idiosyncrasies in the first place. Old versions of the MS Office package can't read OOXML anyway.
And why Microsoft believes that all these legacy artefacts must be included in a 'new' format is beyond me. It ought to be the task of the converter to handle these idiosyncrasies in the first place. Old versions of the MS Office package can't read OOXML anyway.
Isn't the more serious problem the fact that you can't use a date before 1900 within a formula? It's pretty bad to include a bug in the spec, but to make it impossible to use dates before that date, but allow dates up to 9999 seems strange at the least....
How many Microsoft engineers does it take to change a light bulb ?
None, they wait for one year and then propose darkness to be the new standard.
Post a Comment
None, they wait for one year and then propose darkness to be the new standard.
Links to this post:
<< Home

