I sometimes hear it said that formats like OOXML, or ODF for that matter, are simply XML serializations of a particular application’s native data representation. This is said, seemingly, in an attempt to justify quirky or outright infelicitous representations. “We had no choice. Office 95 represents line widths in units of 1/5th of a barleycorn, so OOXML must as well”. This technological determinism indicates poor engineering judgment, laziness, or both.
An easy counter-example is HTML. Does HTML reflect the internals of NCSA Mosaic? Does it represent the internals of Netscape Navigator? Firefox? Opera? Safari? Are any faults in HTML properly justified by what a single browser does internally? Applications should follow standards, not the other way around.
The question we should be asking is not whether a standard is similar to an application’s internal representation. That in itself is not necessarily a fault. We need to go a step further, and ask if this encoding represents reasonable engineering decisions, not just for that one application, but for general use? Or in ISO terms, does it represent the “consolidated results of science, technology and experience”? If it is a good, reasonable engineering choice with general applicability, and the original application already found that solution, then this is a good thing. We should be encouraging standards to encode the best practices of industry.
Take colors for example. There are only so many ways one can represent colors in markup. You can have an RGB value encoded as triplets where red= (255,0,0). Or you can have a hexadecimal integer encoded as RRGGBB, where red=FF0000. You can do it W3C style, like CSS or XSL-FO with a hash mark in front, like #FF0000″. There are variations on this, adding an alpha channel, using a different color model, etc. These are all reasonable engineering choices, and no one would fault a standard for choosing any one of them, even if the choice happens to match what a particular application does. They are all reasonable choices.
The final arbiter is engineering judgment. Making a capricious choice, merely because a particular application made that same choice, in spite of contrary engineering judgment, this would be a bad thing.
With this in mind, let’s take a look at how OOXML and ODF represent a staple of document formats: text color and alignment. I created six documents: word processor, spreadsheet and presentation graphics, in OOXML and ODF formats. In each case I entered one simple string “This is red text”. In each case I made the word “red” red, and right aligned the entire string. The following table shows the representation of this formatting instruction in OOXML and ODF, for each of the three application types:
Format | Text Color | Text Alignment |
---|---|---|
OOXML Text | <w:color w:val=”FF0000″/> | <w:jc w:val=”right”/> |
OOXML Sheet | <color rgb=”FFFF0000″/> | <alignment horizontal=”right”/> |
OOXML Presentation | <a:srgbClr val=”FF0000″/> | <a:pPr algn=”r”/> |
ODF Text | <style:text-properties fo:color=”#FF0000″/> | <style:paragraph-properties fo:text-align=”end” /> |
ODF Sheet | <style:text-properties fo:color=”#FF0000″/> | <style:paragraph-properties fo:text-align=”end”/> |
ODF Presentation | <style:text-properties fo:color=”#FF0000″/> | <style:paragraph-properties fo:text-align=”end”/> |
The results speak for themselves.
What is the engineering justification for this horror? I have no doubt that this accurately reflects the internals of Microsoft Office, and shows how these three applications have been developed by three different, isolated teams. But is this a suitable foundation for an International Standard? Does this represent a reasonable engineering judgment? ODF uses the W3C’s XSL-FO vocabulary for text styling, and uses this vocabulary consistently. OOXML’s representation, on the other hand, appear incompatible with any deliberate design methodology.
I fear that before we can tackle harmonization of ODF and OOXML, we will first need to harmonize OOXML with itself!