It is the type of response that was crafted to end all debate and justify all sins: “Backward compatibility with billions of documents produced over decades”. Variations of this occur everywhere. Rather than cite them all, a simple Google query will bring up a representative sample.
Let’s take a deeper look at this argument.
There is a game called Zendo, where a player, called the “Master”, forms in his mind a secret rule which governs the selection and arrangement of objects (often small colored blocks). Arrangements which conform to the secret rule are said to have “Buddha nature”. The other players take turns selecting and arranging their own blocks to conform to what they think the secret rule is, to which the Master will acknowledge success or failure. The winner is the one who first guesses the secret rule, which might be something “an odd number of blocks, at least one of which must be red”.
Microsoft is playing Zendo with the OOXML specification. The Master has formed a secret rule. He calls it, “backwards compatibility with billions of office documents”. But since the file format documentation for the proprietary legacy binary formats has not been made public, the rule might as well just been called “Buddha nature”. It is just as opaque. We have no way of judging whether any specific requirement of OOXML is there to support backwards compatibility, or whether it is just there for the convenience of the Office development team. Or in fact whether it is there to raise barriers to non-Microsoft implementers. How could we know, since the solitary constraint on the creation of OOXML dependent on information that isn’t public? Does Ecma TC45 itself even have access to the binary format specifications? How are they able to properly judge what is done in the name of compatibility? Do we all just take Microsoft’s word for it?
The key point (in my opinion) is that legacy compatibility may be a constraining factor, but it need not be the sole determining factor. There are many, perhaps an infinite number of possible markups which would be compatible with the legacy formats, meaning the legacy documents can be unambiguously transformed into the new XML format. The constraint should be that they are mappable, not that they must be identical. Among the set of such possible XML formats, some will be elegant, some sloppy, some bloated, some sparse, some which will be easy for others to implement, some designed to minimize conversion work for just one vendor, etc. In other words, this can be done well, or it can be done poorly. The constraint of compatibility does not justify everything. Compatibility is one requirement, but it is not the only requirement.
An example may make things clear. Word has a feature called Art Page Borders. If you are like me, you’ve gone 15 years without seeing or using this feature. But it is there, under the Format/Borders and Shading menu, on the Page Border tab.
The markup needed to define these borders is covered in section 2.18.4 “ST_Border (Border Styles)” of the OOXML specification. Here we see descriptions and images of 200 hundred or so Art Page Borders. The images are heavily weighted to Western European, even Anglo-American celebratory icons, things like gingerbread men for Christmas, pumpkins for Halloween, or images of Cupid for St. Valentine’s day, or globes which are neatly centered on the United States. I think it is a legitimate concern that a document format with such obvious cultural biases is moving forward toward an international standard.
Further, I am concerned that the specification includes what can only be considered a clipart collection. What legal rights does the implementer have to reproduce this clipart? Keep in mind that Microsoft’s “Covenant Not to Sue” covers patents, not copyrights. I haven’t seen anything that would grant implementers of OOXML the rights to reproduce this clipart in their application. Is the specification hard-coded to use clipart which we cannot copy?
All of these problems (spec bloat, cultural bias, non-extensibility, copyright concerns) can be solved by one simple mechanism. Instead of having ST_Border be a fixed enumerated set of values, have it include only a small number of trivial values like the basic line styles, and have everything else (all of the Art Borders) be stored as a separate image file in the document archive.
So, if you load a Word XP document that uses the “candyCorn” Page Border, then when you write it out to OOXML, you would include a single frame of that art in the zip file and have the XML document reference that image for the border, tiling as necessary. This solution has several advantages:
- It removes some bloat from the spec. No need to document 100’s of page border clip art
- It lowers the barrier to implement. No one is required to implement 100’s of border styles. They are all generated on-the-fly based on images stored in the document.
- Copyright concerns are eliminated.
- Is an extensible approach. An implementation can include different or additional border styles according to their business and cultural requirements.
- It is compatible with legacy documents. Any existing Word binary or XML document can unambiguously be mapped into this scheme
Of course, this approach would require some minimal code changes in Microsoft Word to support this extensible mechanism. But remaining backwards compatible with the Microsoft Word product was never a stated constraint on OOXML. No one ever said that the goal of Ecma OOXML was to reduce the cost for Microsoft to implement it. It is all about the legacy documents, right?
So there it is, one example to illustrate a point that can be repeated over and over again. Among the potential universe of compatible XML formats for Office are those which are flexible, easy to use, easy to implement, as well as those which simply perpetuate the status quo and vendor lock in.