A curious FAQ put up by an unnamed ISO staffer on MS-OOXML. Question #1 expresses concerns about Fast Tracking a 6,000 page specification, a concern which a large number of NB’s also expressed during the DIS process. Rather than deal honestly with this question, the ISO FAQ says:
The number of pages of a document is not a criterion cited in the JTC 1 Directives for refusal. It should be noted that it is not unusual for IT standards to run to several hundred, or even several thousand pages.
Now certainly there are standards that are several pages long. For example, Microsoft likes to bring up the example of ISO 14496, MPEG 4, at over 4,000 pages in length. But that wasn’t a Fast Track. And as Arnaud Lehors reminded us earlier, MPEG 4 was standardized in 17 parts over 6 years.
So any answer in the FAQ which attempts to consider what is usual and what is unusual must take account of past practice JTC1 Fast Track submissions. That, after all, was the question the FAQ purports to address.
Ecma claims (PowerPoint presentation here) that there have been around 300 Fast Tracked standards since 1987 and Ecma has done around 80% of them. So looking at Ecma Fast Tracks is a reasonable sample. Luckily Ecma has posted all of their standards, from 1991 at least, in a nice table that allows us to examine this question more closely. Since we’re only concerned with JTC1 Fast Tracks, not ISO Fast Tracks or standards that received no approval beyond Ecma, we should look at only those which have ISO/IEC designations. “ISO/IEC” indicates that the standard was approved by JTC1.
So where did things stand on the eve of Microsoft’s submission of OOXML to Ecma?
At that point there had been 187 JTC1 Fast Tracks from Ecma since 1991, with basic descriptive statistics as follows:
- mean = 103 pages
- median = 82 pages
- min = 12 pages
- max = 767 pages
- standard deviation = 102 pages
A histogram of the page lengths looks like this:
So the ISO statement that “it is not unusual for IT standards to run to several hundred, or even several thousand pages” does not seem to ring true in the case of JTC1 Fast Tracks. A good question to ask anyone who says otherwise is, “In the time since JTC1 was founded, how many JTC1 Fast Tracks have been submitted greater than 1,000 pages in length”. Let me know if you get a straight answer.
Let’s look at one more chart. This shows the length of Ecma Fast Tracks over time, from the 28-page Ecma-6 in 1991 to the 6,045 page Ecma-376 in 2006.
Let’s consider the question of usual and unusual again, the question that ISO is trying to inform the public on. Do you see anything unusual in the above chart? Take a few minutes. It is a little tricky to spot at first, but with some study you will see that one of the standards plotted in the above chart is atypical. Keep looking for it. Focus on the center of the chart, let your eyes relax, clear your mind of extraneous thoughts.
If you don’t see it after 10 minutes or so, don’t feel bad. Some people and even whole companies are not capable of seeing this anomaly. As best as I can tell it is a novel cognitive disorder caused by taking money from Microsoft. I call it “Sinclair’s Syndrome” after Upton Sinclair who gave an early description of the condition, writing in 1935: “It is difficult to get a man to understand something when his salary depends upon his not understanding it.”
To put it in more approachable terms, observe that Ecma-376, OOXML, at 6,045 pages in length, was 58 standard deviations above the mean for Ecma Fast Tracks. Consider also that the average adult American male is 5′ 9″ (175 cm) tall, with a standard deviation of 3″ (8 cm). For a man to be as tall, relative to the average height, as OOXML is to the average Fast Track, he would need to be 20′ 3″ (6.2 m) tall !
For ISO, in a public relations pitch, to blithely suggest that several thousand page Fast Tracks are “not unusual” shows an audacious disregard for the truth and a lack of respect for a public that is looking for ISO to correct its errors, not blow smoke at them in a revisionist attempt to portray the DIS 29500 approval process as normal, acceptable or even legitimate. We should expect better from ISO and we should express disappointment in them when they let us down in our reasonable expectations of honesty. We don’t expect this from Ecma. We don’t expect this from Microsoft. But we should expect this from ISO.
Very interesting post, like always you bring hard evidence unlike those from the OOXML camp.
Yet one question arise when I read you blog post. You wrote:
>Since we’re only concerned with JTC1 Fast Tracks, not ISO Fast Tracks or standards that received no approval beyond Ecma, we should look at only those which have ISO/IEC designations. “ISO/IEC” indicates that the standard was approved by JTC1.
I think you should give the values for ordinary ISO FAST track also, or give a explanation why ISO Fast Track and ISO JTC1 Fast Track are so different that it pointless to look at these as a reference.
This addition might not look like that essential, but I am fairly certain that Microsoft people in private communication with people that have read this will dodge the issue by half truths about that you deliberatly avoided to include all Fast Track submissions to make your point.
BTW I think it is the same tactics they used to downplay the importance of the finding of the OOXML and ODF converter group. By not publically challenge those findings that was incorrect Microsoft personal in private communication could present solutions for some of the issues to give the impression that OOXML indeed can handle everything in ODF.
ISO Fast Track differs from JTC1 Fast Track in that ISO Fast Tracks are FDIS’s, not DIS’s. This means that if the FDIS ballot fails ISO Fast Tracks must go back to committee where additional work can be performed on it. This can lead to another ballot, or two several iterations of ballots and more editing. The committee also has the ability to publish the proposal as a Technical Specification, or even to cancel the project.
So the net is that ISO Fast Track gives the technical committee more options, as well as more time to deal with defective proposals.
Lies, damn lies, and statistics!
Nice cogent analysis!
Given your description of the differences between ISO Fast Track and JTC1 Fast Track, I would think the suggestion to show numbers for all of the Fast Tracks, rather than just the one, is still pertinent – but probably as a separate graph.
I also think it would be useful to have a graph of the ISO standard ratification times for standards over some large value, such as 2k pages. That is, from the time that such standards are submitted to ISO, how long does it normally take for them to be accepted? I only personally know of two such standards – the MPEG 4 standard you mentioned, which took 6 years, or the SQL standard, which apparently took 20 years.
Maybe OOXML is in fact the 20 ft. man?
I mean, isn’t Microsoft one of a kind?
Has there ever been a company as powerful as them? Has anyone in history ever been able to avoid authorities as well as them? Has there ever been a company that writes as much good and flawless software as them? Has there ever been a man as wealthy as Bill Gates?
More importantly, has anyone ever done a better job at defining interoperable standards as them? In fact, their OOXML work was so good that people have been left in shock.
BUT
That still doesn’t excuse the deceptions/lies.
Perhaps Microsoft would help itself by looking at things from the point of view of their clients. What is more likely? ..that OOXML is a 6 ft. man on stilts covered by a blanket or that we are looking at a real 20ft man?
Clearly the 6ft man pretending to be 20 ft is the more common case, most would agree.
And this is why I think Microsoft should start being honest and not nearly as misleading as they have been. This way, when the 20 ft OOXML wolf really comes, they will be believed.
[Great post btw. I always wanted to see a 20ft man. It’s a once in a lifetime opportunity. .. hmm, but now what do I have to live for? .. wait, I have heard that lightning sometimes strikes twice… like on the top of those skyscrapers where it strikes over and over and over during heavy electrical storms. I guess, when we think about it, 20 ft men are probably a common phenomenon. In fact, with enough greasing, maybe 20 ft men will become the norm from now on! Pity on those that will trust their cherished valuables to OOXML though because while 20ft men may be the wave of the future, they tend to do a great many more obscene things with your cherished goods under those large sheets. Hey, don’t mess with a 20 ft man’s sheets.]
On a more serious note. Rob, I heard that OOXML stores files a bit like as a set of assembly instructions. Would you have the resources/time/etc to analyze a bit in a statistical fashion the effect that file corruption errors (eg, from malware or from an unclean computer shutdown) would have on various OOXML documents vs ODF documents (which I believe tends to store something closer to the end state of the files). In other words, if I were following a set of instructions on how to reach the south pole, how much more likely would I be instead to end up in the north pole if the instructions were written on a slightly corrupted OOXML document instead of on a slightly corrupted ODF document, perhaps because I’d turn left at the equator instead of right? My guess is that this sort of OOXML brittleness really makes it easier for a market controlling player to thwart competitors since the slightest artifact in the file can lead to very different end products, making it that much more difficult to reverse engineer the proprietary extension bits of the market leader.
And again, thanks for this current analysis and all others. Surely, you are helping to prevent many a gross mistake being committed on the part of customers that have been deceived in the past. When really large and dirty players go down, whatever will the customers heavily invested in them do? Fortunately, no one will end up in such a position after all the work you and others have done to make it easier for non technical individuals to appreciate what is obvious to most engineers (for example, with this piece http://www.robweir.com/blog/2008/01/what-every-engineer-knows.html ).
This is one person who is not surrendering his valuable data to Microsoft’s OOXML without a fight.
Jose_X,
Thanks for the note. Your question on document corruption is an interesting one. But is this really a problem people are facing? I haven’t suffered from corrupt files since I moved off of XMODEM file transfers back in the 1980’s.
But it is an interesting question. There are structures in the ZIP format, like the central directory, that could cause failure if corrupted. Ditto for the DEFLATE dictionaries for each file.
Once uncompressed, the markup structure of the document is vulnerable. A missing quote, or less-than sign would typical cause the entire document to fail to parse, at least with XML parsers expecting well-formed input.
Then you have the ODF or OOXML level internal structures, the links between content and style definitions, between styles and inherited styles, etc. Any of those, if corrupted, will cause large problems.
In practical use, I think the primary need is to be able to detect if corruption exists and let the user know it has occurred. This could be done with simple checksums in the case of non-malicious threats, or with digital signatures in the case of malicious threats. If you find that your document has been compromised, then ask the originator to re-send.
File corruption isn’t too common, but it does occur. I had a flaky SATA controller last year and that showed up as file corruption.
rob,
Which percentages of a “typical” OOXML file corruption error would “totally” destroy an OOXML document vs the ODF case is not something I am capable or willing to tackle at this point in time. That is why I was sheepishly trying to pass the task onward (I confess and sorry) or at least bring it up in discussion.
One reason I am not willing to tackle that problem is because I don’t believe standards (especially when it comes to something as intricate as software) have very much value to most users when important market participants don’t want the standard to work. It would be like trying to speak English with someone that is constantly twisting the possible interpretations of the conversation with an intent to put you out of business, except that computers are less forgiving than (not as “smart” as) people speaking English. To make matters worse in this case, there is a player with a virtual monopoly that naturally has no interest in the standard working for others. So now imagine living in a city where virtually everyone is twisting your words and collaborating behind your back.. and now pretend you are limited to a computer program without even an IQ of 10 to resolve the issues and survive. Want yet more obstacles? OK, how about if the language will officially change at a moment’s notice (as far as you are aware) in a way that is directed by those working behind your back. Still not enough of a challenge? OK, what if the *de facto* language will be changing as frequently as necessary and you won’t be clued in to the new meanings of words until you find yourself in court (if then). Even worse will be when history can effectively be rewritten through unfair tactics (such as various key conspiring clerks being authorized to change any document “as they deem necessary” or such as the statutes being rewritten to have retroactive affects negative to your business).
That is what life is like working within the context of a Monopoly controlling the key platforms. Standards are a charade from the pov of the “citizens” though like most such charades they do serve the purpose to those in power of helping to forstall a revolution.
This also explains why ISO did not have as great a focus on building airtight rules as one might expect. ISO would only be useful to those willing to work together in good faith in the first place.
Microsoft’s attraction to standards is likely mainly as an attack weapon to disrupt those trying to organize without Microsoft’s blessing, as well as as a tool to fool developers and others that otherwise might not to work for Microsoft’s cause under the false assumption of measured safety (ie, another deceptive marketing gimmick). If the competition uses standards, Microsoft has to compete in that department to win back hearts and minds that it will necessarily lose.
OK, with that off my shoulders and hopefully having made the point that finding ways in which OOXML can be used against customers is an endless task, consider also this wrt the brittleness discussion. Say OOXML is much more brittle, the dominant market player may have a secret redundancy extension that can reduce the brittleness of such files significantly. This proprietary improvement in brittleness could end up becoming a significant product advantage.
Besides “brittleness” many other such defects in OOXML could be fixed by proprietary MSOOXML. Their large market share would then ensure that most documents out there would appear robust [or … pick some other quality] only when handled by that vendor’s applications since these improvements would be proprietary. And let’s not forget that this company will be advancing the standard in lockstep with their application. Thus that company will continue to have vested interests in keeping OOXML strategically weak and broken into the future.
As for “corruption”, that can happen through any bug within any application handling a document file. But not so much “corruption” as that any implementation differences in handling across apps introduced at an early enough stage in a stream of instructions can lead to greatly magnified differences in the end product for a long enough and intricately complex enough such stream (something like the butterfly effect, the game of phone, chaos, etc).
Slightly off topic, you may also be interested in the posting “Authored by: Jose on Wednesday, April 23 2008 @ 01:43 PM EDT” found here http://www.groklaw.net/article.php?story=20080421091129596#comments . Keep in mind that I have paid careful attention to some standards before (eg, W3C) but have not delved deeply into most as far as implementing them or working at a high level within standards bodies.
PS: And don’t mean to appear to be brown nosing, but I have to say that this blog deserves special recognition when the history of OOXML gets written. Many of the (perhaps implicit) lessons carry over to many more contexts, for those that want to pay attention.
Well, you just have to excuse OOXML, right? I mean, it isn’t ONE standard, it’s more like a dozen different Microsoft-invented “standards” all glommed into one lousy product.
So, clearly, that excuses it from being of a sensible length… right? :)
But remember that the ISO FAQ was not attempting to justify OOXML as a product. The FAQ was trying to justify ISO’s treatment of the OOXML submission within the Fast Track process. I would argue that any standard, even one of above average quality, at 6,000 pages in length would be impossible to adequately review in a 5-month Fast Track ballot.
I don’t think anyone can disagree with this on principle. If not 6,000 pages, then why not allow 100,000 page standards or 1,000,000 pages? There is obviously a length at which any fixed-period 5 month review becomes a mockery of the process and the participants. We may disagree on exactly where that limit is, but I don’t think anyone can honestly argue that it does not exist.