My post of 10 days ago, How to hire Guillaume Portes, received quite a bit of attention, with over 50,000 page views, links from 25 blogs, and around 300 comments left by visitors to this blog, Slashdot and the Joel on Software discussion group. I’d like to thank all that took the time to read and to comment.
It is good to continually tell the story and make the case. Having two standard file formats for office documents would be a bad thing for commerce, for end users and for the industry. With two formats, end users will be confused and costs will be higher for those who sell and buy software that works with documents. This will essentially cause a frictional drag on the document processing market. Sure, there will be those who will benefit from the chaos, just as there are those who benefit from the friction of currency exchanges. But over the years we’ve learned the value of things like uniform commercial codes, currency unions and uniform trade regulations.
I’ve heard no one complain about having lost their freedom of choosing the Mark over the Lira over the Franc. We simply use the Euro and then concentrate on what we are buying or selling, not on the currency. In a similar way we should agree on a single document format and then concentrate on application features and user needs and what we are trying to communicate, and stop worrying about file formats. When done well, file formats are invisible. They are not seen by end users, are not discussed by the press, and not thought about by (most) engineers. The fact that I’m writing about OOXML at all and not about my wine making exploits is an aberration caused by the failure of the dominate market player to provide an open document standard that allows users to own their documents.
But I digress…
Now that I’ve finished reading all of the comments, I’d like to review with you some of the better ones, pro and con, along with my commentary.
I don’t know how many of you noticed: The fictional name “Guillaume Portes” is actually a literal translation of “Bill Gates” in French.
If you noticed this as well, give yourself 3 extra points. Many of my posts have a secret joke, and I hope these will bring a smile to those who find them.
Here’s comment questioning whether there is a problem with OOXML:
I haven’t looked at the spec, so I don’t know how good or bad it is. But the examples he cites don’t strike me as such a problem. They’re all just to maintain backward compatibility with documents from old versions of Word and other apps. You would be free to ignore them if you don’t need that compatibility. I’m not sure how else they could have done it.
A similar view was expressed by another reader:
I don’t know if it has been stated here, but you do know that supporting these the compatibility options is not required for OpenXML compliance? Developers are free to leave these out of they want.
By that same argument developers can also leave out text alignment, images and tables, since these features are not required for compliance either. In fact, everything in OOXML is optional. If you read the compliance definition in the OOXML specification, it comes down to this statement in Section 2.5:
Application conformance is purely syntactic…A conforming consumer shall not reject any conforming documents of the document type expected by that application. A conforming producer shall be able to produce conforming documents.
Given this definition of conformance, a fully conformant OOXML application can be as simple as:
cp foo.docx bar.docx (Linux) or
copy foo.docx bar.docx (Windows)
In the end, regardless of whether a feature is optional, or even deprecated, if that feature occurs in real OOXML documents, then an OOXML application that aspires to be used and have viability, either commercially or as open source, will need to support it. It is that simple.
There is only one OOXML specification and to an end user all OOXML files are equivalent and interchangeable. The user who receives a document via email, from a government web site, from a colleague, friend, teacher, etc., doesn’t know whether the document was created in Word, created in OpenOffice, created from scratch in Office 2007, saved from Office 2000 with the Compatibility Pack or whether the document was originally authored in WordPerfect and made it into OOXML format only after migration over years via various Office upgrades. It is a DOCX file and users will expect that applications that claim OOXML support will work with their DOCX file, period. Anything else is a support nightmare. That is the entire point of a standard — interoperability — so we must judge OOXML by how well it facilitates that function.
Here is another comment with a view expressed by others as well:
Someone needs to tell every developer of word processing and page layout software on the planet to abandon the ‘must look the same’ obsession described by the above. Why worry about making content in application B look like content in Application A? I create books out of Word files submitted by several people. The last thing I want is all the inconsistent formatting from each of them to control a book’s look.
Named styles is the answer. If a paragraph is body text, call it that. If it’s an inset quote, call it a quote. If a term is in italics, label it as italicized style not Times Italic 12 point. But don’t get all hung up in the distinctions between Times Roman and Times New Roman. The purpose of XML is to define what something is. Not what someone thought it ought to look like on Tuesday three weeks ago.
My personal views are very much in alignment with these sentiments. I think WYSIWYG has done more bad than good over the years, and that strict separation of content, layout and styles should be maintained. However, I also know that my personal views are not universally held, and that the word processor has evolved over the years to be a flexible, multi-paradigm tool that can support both structured document editing as well as looser, ad-hoc editing by users who just need to grind out a memo. A document format for a modern word processor must support both uses.
I’m glad someone brought up the core question:
Considering the requirement that the standard allow for compatibility with existing documents, what would you suggest?
Silently altering documents that are converted into OpenXML?
Disallow automatic conversions whenever a compatibility flag would have otherwise been needed?
One solution approach was mentioned by several users, for which I give two examples:
There is no need to include features from 16 year old (or any age) applications in a new standard. If you want to convert, you convert. If WP6 linespacing is 0.8 of Word2007 linespacing, you write linespacing =”0.8″ in your converted document. You DON’T write useWP6linespacing linespacing =”1″
That is just plain silly. That is making a specification unnecessary large for instances that are rarely used by the general public.
As said: if you want to convert, than use a conversion tool. Do not use a modern specification to hold all legacy features.
Let the plugins do the dirty work of native in-memory-binary representations to XML and back conversions.
Keep the XML file format clean, open, unencumbered, application independent, cross platform, universally transformative and exchangeable, portable and timeless.
I think this is the key point, and I’m gratified that so many readers picked up on it. There is no good reason to have these compatibility flags at all. Instead of having several undefined compatibility flags for legacy line spacing options, we should have a flexible line-spacing model in OOXML and when loading legacy binary documents, convert them as necessary into the line spacing model of OOXML. If the text model in OOXML is sufficiently expressive, this can be done with no loss of fidelity. (And no, a flag that says merely “do it like Word 95” is not an example of expressiveness).
This is what I mean by “generalize and simplify”. A simplified specification is not necessarily less expressive or less capable. A specification is simplified when it supports internal reuse and accomplishes its task with minimal means.
However, if the text model of OOXML is not flexible enough to support even legacy versions of Word, then what hope will the rest of the industry have in adopting it as a format? How will Novell manage with getting OpenOffice to use it, or Corel to get WordPerfect to use it? What about Lotus WordPro? Will Ecma add special compatibility flags to the OOXML specification to account for the quirks of every word processor with legacy documents? What would OOXML look like if we all loaded it up with such legacy flags? Is this the precedent we want to set?
Why should OOXML have special flags for WordPerfect 6.0 (1996) but not have special flags for WordPerfect 12.0 (2004) or the new X3 version (2006)? Is this purely because Microsoft considered WordPerfect to be competitor back in 1996 but now no longer cares? Is this the way to go about designing an ISO standard?
I believe that having compatibility flags in the specification for all word processors in use today is not a practical solution, and that having such flags only for Word and ancient versions of competing products is an approach that benefits only Microsoft.
One last point, since this post is already too long. Microsoft’s Brian Jones is claiming that ODF has a similar issue, in that OpenOffice writes out a number of application-specific settings when it saves a document. This is a good illustration of an important distinction. The items that OpenOffice writes out (you can see an example in Brian’s post) are vendor-defined, document-level application settings. There are now and will continue to be multiple implementations of ODF and it is legitimate that they have application-defined features. These are stored as name/value pairs in a separate XML file in the ODF archive.
I can think of no argument against that. Obviously no interoperability is expected for these vendor specific features, which are for things like application settings like window sizes, zoom factors, print settings, etc. In any case, ODF merely provides a place for applications to store these settings. To blame ODF for any vendor misuse of this feature is like blaming the W3C and HTML for non-standard extensions in Internet Explorer.
OOXML, on the other hand, does not seem to have given much thought to what would be needed in a format that has multiple supporting applications. Only a single application (MS Office) has been explicitly considered, and support for that one application, and its predecessor versions, have been hard-coded into the OOXML schema.