I wish to discuss a recent blog post, a vigorous defense of Microsoft’s Office Open XML and XAML from Novell’s Miguel de Icaza. His post is so wrong, on so many levels, that I am somewhat at a loss for words. Miguel is not stupid, and I find it hard to believe that he is a Microsoft shill, so I must assume that he was imperfectly informed on this issue. “Everyone is entitled to their own opinions but they are not entitled to their own facts,” as Pat Moynihan was fond of saying. I’ll try hard not to make this personal, but there are so many errors in his post that he may very well feel the sting of correction in my words, and for that I apologize in advance.
I suggest you read through Miguel’s post in its entirely, and then return here for my response.
After an attack against lawyers, we come to some technical comments:
Unlike the XML Schema vs Relax NG discussion where the advantages of one system over the other are very clear, the quality differences between the OOXML and ODF markup are hard to articulate.
The high-level comparisons so far have focused on tiny details (encoding, model used for the XML). There is nothing fundamentally better or worse in those standards like there is between XML Schema and Relax NG.
ODF grew out of OpenOffice.org and is influenced by its internal design. OOXML grew out of Microsoft Office and it is influenced by its internal design. No real surprises there.
Maybe I can be of some assistance here, helping to articulate the difference in quality between ODF and OOXML. ODF, starting from its roots in OpenOffice.org specification, spent a further 2 1/2 years being improved and reviewed in OASIS, then further work preparing for submission to ISO, then a further year in ISO, receiving more comments and corrections, before it was published as an ISO standard. So this is a combined 4 years in technical committees being refined by standards bodies. During this time ODF has been implemented in dozens of applications, including full suites like OpenOffice.org, KOffice and Lotus Workplace, as well as individual applications like AbiWord, Gnumeric and Google Docs and Spreadsheets.
In comparison, OOXML went from a proprietary Microsoft specification to an Ecma standard in record time. If you make something 8 times lengthier than ODF, and do it 4 times faster than ODF, then you are going to have a quality problem. The list of problems on GrokLaw is one list of known problems in OOXML. Note that particular list was generated in only 3 or 4 days by volunteers. I recently did a sampled survey of OOXML specification quality and predicted that it contains thousands of errors.
And where are the OOXML implementations? OOXML was approved by Ecma and submitted to ISO without a single available implementation. Certainly, Office 2007 later shipped with support, but is that it? A single implementation? Until you have at least two independent implementations of a standard you will have a very imperfect understanding of the standard’s quality.
So the question to ask is this: Why should JTC1 NB volunteers deal with the mess that Microsoft dropped on their lap by their overhasty review of OOXML in Ecma? Why should they spend the next 6 months reviewing this specification when even a cursory review shows it is defective in so many ways? And considering the observed low level of quality, why should it be reviewed and approved via a Fast Track process, and all in one big chunk of 6,000 pages? Isn’t this the last thing you want to do, following up a rushed review in Ecma by a rushed review in ISO? Instead this should go back to Ecma to let them do a proper review, one they can be proud of.
Miguel correctly points out that OOXML derives from Microsoft Office’s formats, and ODF derives from OpenOffice.org’s formats. But then he leaps to an assertion that they both reflect their parent application’s internals. This is not true. Only a poorly-designed file format reflects the internals of the application. Maybe that is how we did it back in the 1980’s, but best-practices for portable file formats have been known for years now. That is why we have data formats like XML, so the format can be independent of the application internals. ODF was designed, even in the OpenOffice days, from the ground up to be an application- and platform-neutral document format. While it was further developed in OASIS, it continued to take on such good qualities as reuse of existing relevant W3C standards such as XForms and MathML and SVG. So certainly, the platform-independence and open nature of OpenOffice.org rubbed off on ODF, but isn’t that an extremely good thing?
OOXML, on the other hand, matches to an inane degree the internals of a single vendor’s legacy application, with no concessions to platform-neutrality. For example, OOXML encodes data in non-XML formats such as binary blobs, bitmasks and other encodings that defy XML schema validation or processing by XML tools. As I’ve said before, this is not a specification, this is a DNA sequence.
Does that help articulate the difference?
Miguel then takes on the size question:
A common objection to OOXML is that the specification is “too big”, that 6,000 pages is a bit too much for a specification and that this would prevent third parties from implementing support for the standard.
Considering that for years we, the open source community, have been trying to extract as much information about protocols and file formats from Microsoft, this is actually a good thing.
This is good thing, I agree, that Microsoft has produced this specification. I’d like even more for them to make the specification for the Office binary formats public, since that is the format that the billions of legacy documents are actually in. I hope you’ll join with me in calling for Microsoft to release the specification for these formats under their Open Specification Promise, so that users will truly be able to choose which format they want to remain in or move to.
However, merely because it is useful from a disclosure perspective, does not necessarily mean it will make a good standard. Simply because it is better than nothing does not mean it is sufficient for an ISO standard. There is an important difference between a descriptive specification and a prescriptive standard. Writing down file formats is a small virtue, and one that other companies have done for years. Do they all deserve to be ISO standards?
For example, many years ago, when I was working on Gnumeric, one of the issues that we ran into was that the actual descriptions for functions and formulas in Excel was not entirely accurate from the public books you could buy.
OOXML devotes 324 pages of the standard to document the formulas and functions.
Depending on how you count, ODF has 4 to 10 pages devoted to it. There is no way you could build a spreadsheet software based on this specification.
This is a rather bold misstatement, considering that implementations such as OpenOffice.org, KSpread, Gnumeric, Google Spreadsheets, Lotus Workplace, etc., already in fact exist. Go back even earlier, we had 1-2-3, Quattro Pro and OpenOffice all supporting Excel’s formulas even though there was no formal specification for it. Sure having a good specification helps, but the extreme rhetoric that says that this is unimplementable is patently absurd. Just look around.
Some folks have been using a Wiki to keep track of the issues with OOXML. The motivation for tracking these issues seems to be politically inclined, but it manages to pack some important technical issues.
Hmm… The open source community helps test a purported open standard, reports the defects it finds, and this is called “politically inclined”? Isn’t this what open source is all about, “given sufficient eyeballs, all bugs are shallow”? Shouldn’t open standards be subject to scrutiny? As I said in my blog, I am so impressed by the quality and productivity of this type of wiki-enabled public review that I am going to investigate how we can do this to solicit public comments on ODF 1.2. This isn’t for political reasons. This is because it works.
Some of the objections over OOXML are based around the fact that it does not use existing ISO standards for some of the bits in it. They list 7 ISO standards that OOXML does not use: 8601 dates and times; 639 names and languages; 8632 computer graphics and metafiles; 10118-3 cryptography as well as a handful of W3C standards.
By comparison, ODF only references three ISO standards: Relax NG (OOXML also references this one), 639 (language codes) and 3166 (country codes).
Not only it is demanded that OOXML abide by more standards than ISO’s own ODF does, but also that the format used for metafiles from 1999 be used. It seems like it would prevent some nice features developed in the last 8 years for no other reason than “there was a standard for it”.
Miguel has inexplicably ommitted all of the W3C standards that ODF uses, such as XForms, MathML, SVG, XLink, SMIL, XSLT, CSS2 as well as IETF standards such as RFC 2045, RFC 2048, RFC 2616, RFC 2898, RFC 3066, RFC 3987. To imply that OOXML follows more standards that ODF is a foolish statement, unsupported by facts.
On the WMF, Miguel has it all wrong. What is a Windows Metafile? It is simply a recording of the graphical function calls made by Windows as it renders a drawing. It maps 1-to-1 into Windows API calls. It maps so closely to Windows that when the WMF format was found to be vulnerable to a security flaw, even the Wine Windows compatibility layer for Linux was susceptible to the same security hole. WMF (and VML, another legacy format in OOXML with a history of security problems) are flawed formats. One security vendor said: “Turns out this is not really a bug, it’s just bad design. Design from another era.” and “The WMF vulnerability probably affects more computers than any other security vulnerability, ever.”
Although Miguel is pleased to note that the proposed cross-platform ISO standard, Computer Graphics Metafile (CGM) dates to 1999, he fails to mention that WMF is even older, dating back to Windows 3.0 (1990).
So which one should be prefered in an ISO standard? The Windows Metafile format which is not documented in an open standard, is tied to the graphical layer of a single vendor, and has design flaws with serious security implications? Is this what we really want? Or do we want an open standard, one designed to be platform neutral, that has been in use for eight years, that has had a community continuing development and promotion of it such as CGM Open and WebCGM? Where is the WMF community? A Google search for WMF comes up with security problems; a search of CGM comes up with communities, initiatives and test suites.
There is an important-sounding “Ecma 376 relies on undisclosed information” section, but it is a weak case: The case is that Windows Metafiles are not specified.
It is weak because the complaint is that Windows Metafiles are not specified. It is certainly not in the standard, but the information is publicly available and is hardly “undisclosed information”. I would vote to add the information to the standard.
Did you really read the Groklaw issues list? WMF is not the only, or even the most troublesome of the undisclosed information in OOXML. Start here, then go back and read the Groklaw list of issues, and let me know if it makes more sense then. I am not that good at explaining these things, so please ask questions and I will try harder.
I have obviously not read the entire specification, and am biased towards what I have seen in the spreadsheet angle. But considering that it is impossible to implement a spreadsheet program based on ODF, am convinced that the analysis done by those opposing OOXML is incredibly shallow, the burden is on them to prove that ODF is “enough” to implement from scratch alternative applications.
There is that claim, that it is impossible to implement an ODF spreadsheet. Miguel, surely you aware of OpenOffice, KSpread, Lotus Workplace, Gnumeric, Google Docs? How can you persist in such obvious error? How could you actually write the above when you know, I know, and everyone reading it knows that it is patently false? Please tell me it was a just a typographical error.
Here’s a challenge: Give me a list of four spreadsheet applications from four different vendors that today are as interoperable with OOXML as the four leading ODF spreadsheets are with ODF.
There is a good case to be made for OOXML to be further fine-tuned before it becomes an ISO standard. But considering that Office 2007 has shipped, I doubt that any significant changes to the file format would be implemented in the short or medium term.
The best possible outcome in delaying the stamp of approval for OOXML would be to get further clarifications on the standard. Delaying it on the grounds of technical limitations is not going to help much.
This is quite a revealing statement. Why should the shipment of Office 2007 factor in the appropriateness and the quality of a proposed International Standard? Should standards of quality be relaxed for Microsoft’s convenience? Do technical limitations not matter because Microsoft has sales targets to meet? Is this what ISO is for? If so, I suggest their hard-working volunteers be given Microsoft salaries and stock options, since clearly they would be working only for Microsoft’s benefit at this point.
Miguel has a good point at the end:
To make ODF successful, we need to make OpenOffice.org a better product, and we need to keep improving it. It is very easy to nitpick a standard, specially one that is as big as OOXML. But it is a lot harder to actually improve OpenOffice.org.
If everyone complaining about OOXML was actually hacking on improving OpenOffice.org to make it a technically superior product in every sense we would not have to resort, as a community, to play a political case on weak grounds.
OpenOffice.org is one, but not the only application of ODF. It is the most prominent one in the traditional heavy-weight office suite model, but I’m not certain that this is the only way forward. We need good implementations, several of them, since one size does not fit all.
In any case I’d say in return that if Microsoft and Microsoft boosters spent some of their time investigating exactly how easy it would be to encode Office’s legacy features on top of the extensible ODF specification, and worked together with the ODF community to address their common concerns, then we could easily have a single interoperable format that we all could use. The resulting standard of OOXML on top of ODF would be smaller, simpler, higher quality and more interoperable than the mess that we’ll end up with by having OOXML as a standard, in addition to ODF.
2/1/2007 — fixed spelling errors reported by a reader via email
2/2/2007 — another spelling error