Ralph Waldo Emerson’s memorable words from his 1841 essay, “Self-Reliance”:
A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. With consistency a great soul has simply nothing to do. He may as well concern himself with his shadow on the wall. Speak what you think now in hard words, and tomorrow speak what tomorrow thinks in hard words again, though it contradict every thing you said to-day. ‘Ah, so you shall be sure to be misunderstood.’ Is it so bad, then, to be misunderstood? Pythagoras was misunderstood, and Socrates, and Jesus, and Luther, and Copernicus, and Galileo, and Newton, and every pure and wise spirit that ever took flesh. To be great is to be misunderstood.
These are fine words for a philosopher, but based on those statements I have my doubts as to whether Emerson would have made a good engineer or businessman.
Where systems are designed for multiple parties to collaborate we must have consistency driven by shared standards.
The first shared standards date back thousands of years and supported mankind’s earliest commercial ventures:
- Uniform weights and measures, so you knew you were getting what you paid for
- Official coinage of specified weight and purity, so you knew what was being paid
- A working language for recording treaties and trade agreements
This was such an obviously good thing that standards existed already at even the earliest limits of our written historical record. In the Flood stories, in both the Old Testament and the Gilgamesh, the authors thought it appropriate to give the exact dimensions of the vessels. When God spoke to Noah, and Enki spoke to Atrahasis, He spoke in the language of the standards of the day.
As civilization progressed, standards took an increasingly larger role. As the railroad, the steamship and the telegraph shrunk the size of nations and oceans, the speed of communications and commerce increased, leading to such diverse standards as railroad gauges, time zones and international postage. Moving into the information age, the increased speed and variety of communication lead to standardized network protocols, media formats and character encodings.
Generally, standards are necessary whenever two or more parties communicate or exchange goods or services.
A look at the US/Chinese Standards Portal, a joint effort between ANSI and SAC, shows the breadth of standards that specify the properties of materials and products we all use every day. Their tag line is, “The international language of commerce is standards”. I concur.
But progress has been uneven. Although I can send an email message anywhere in the world, make a phone call anywhere in the world, send a letter anywhere in the world, and expect that it will be received and read exactly as I intended, formatted documents, spreadsheets and presentations have lacked this level of interoperability. One person uses Word, another person uses WordPerfect, another person uses AbiWord or OpenOffice or WordPro. Older documents might still be in WordStar, XYWrite or Manuscript format. We tried conversions, importing and exporting to various formats for interchange, like RTF and CSV. It worked sometimes, but not always and certainly not well.
How did we get to such chaos in the area of document formats?
It is notable that these applications were designed and their formats defined before widespread commercial use of the Internet. The business user of a word processor circa 1994 shared documents via hard copies, or electronically with only users on their LAN. The facilities for electronic document sharing between business partners, between a company and their customers, or a government and its citizens were not widespread. So company A might use WordPerfect, and company B might use WordStar, but since they didn’t exchange documents, or only did so via hard copy, there was no file format problem.
With the popularization of the Word Wide Web and increased connectivity of businesses to the Internet, another jump forward in the rate of communications took place, comparable to the railroad or the telegraph. The world was now a very small place indeed. This lead to a parallel acceleration of the rate of commerce, as new opportunities arose for supply chain integration, advertising, education, online exchanges, outsourcing, and the new business models that are invented every day.
Today, the document you create can instantly be transported around the world. You may not know who reads your document, what operating system they are running or what applications they are using. They may be running Ubuntu on a laptop on the beach, or a Symbian-enabled mobile phone i rush hour traffic, or even using a screen reader or other assistive technology to render the document according to their needs. We no longer exclusively buy, sell or support the person in the bricks and mortar office down the street. Commerce is global, it is instant, and it is based on standards.
This is where OpenDocument Format (ODF) comes in. After a 15 years of chaos in office document formats, it was time for a standard. The rate of communications and commerce demands it. More importantly, customers demand it.
The complaints I hear about the prior state of affairs revolve around these issues:
- I want to own my data.
- I do not want access to my data controlled by a single commercial entity.
- I do not want to require that people go out and purchase a particular application in order to read my documents.
- I want my documents to be in a format that has long-term stability and understandability
- I want my documents to be in a format that lends itself to processing by a range of tools, both commercial and free.
- I want my documents to be a format that everyone can understand.
- I want to break out of the cycle of having to constantly upgrade my software every time my vendor decides to change formats on me
ODF had its roots in the OpenOffice.org project, was refined in an OASIS Technical Committee and then reviewed and approved by ISO last May. It took almost three years to edit, review and approve the specification, but the results are worth it. Today every major word processor either now implements ODF or has announced plans to do so.
But not everyone is happy with progress. This has always been true. The last Pony Express rider likely cursed at the mere mention of the telegraph. The last DECnet engineer likely mumbled, “Why would anyone want a TCP/IP?” as he packed his belongings and cleaned out his office. And in the realm of document formats, Microsoft is kicking and screaming to try to delay the inevitable widespread adoption of ODF as a document format for everyone.
Why is Microsoft so upset?
The answer is, they enjoy a monopoly in office applications and they know that if users could easily move away from Microsoft Office while preserving access to their documents, then users would leave by the millions. The Fear, Uncertainty and Doubt (FUD) around file formats and fidelity and compatibility is the way Microsoft ensures their lock-in.
Let’s review some history of Microsoft and their office file formats, to get a better sense of how this game is played.
Let’s go back to the early days, the mid 1990’s, when Microsoft did not have such market dominance, back when they had competition in the word processor and spreadsheet market. At that time Microsoft actually documented their file formats. Sure, the specification was incomplete, but it was an honest attempt. You could buy the Excel format in book form from Microsoft Press, or get an electronic version of the Excel and Word formats on an MSDN CD. At one point it was a free download from the MSDN web site.
But around 1999 something happened. The license on the file format specification changed. Where before you could do anything you wanted with the formats, now the specification carried the explicit restriction (my emphasis):
[Y]ou may use documentation identified in the MSDN Library portion of the SOFTWARE PRODUCT as the file format specification for Microsoft Word, Microsoft Excel, Microsoft Access, and/or Microsoft PowerPoint (“File Format Documentation”) solely in connection with your development of software product(s) that operate in conjunction with Windows or Windows NT that are not general purpose word processing, spreadsheet, or database management software products or an integrated work or product suite whose components include one or more general purpose word processing, spreadsheet, or database management software products.
So, file format documentation that was once freely available was restricted to applications that ran on Windows and which did not complete with Microsoft Office.
Soon after this file format information was removed from MSDN altogether. It was only available under a licensing program that had even further restrictions:
This program entitles qualified software developers to license the Microsoft .doc, .xls, or .ppt file format documentation for use in the development of commercial software products and solutions that support the .doc, .xls, or .ppt file formats from Microsoft and to complement Microsoft Office
(How should we parse this? What does it mean to “complement” Microsoft Office? I think in ordinary use, an application that competes against Office would not be considered complementary.)
So what happened between 1995 and 2004 to cause Microsoft to wipe out every bit of publicly-available documentation on their file formats? It seems to me that the main change in that time frame was that they wiped out the competition. The earlier availability of the file format documentation seems to have been in order to encourage developers and partners and those days, Excel was good about documenting their file format, and importing and exporting competing formats like 1-2-3.
Joel Spolsky, talking about what was required for Excel to reach its “tipping point” in adoption, explains it this way:
The mature approach to strategy is not to try to force things on potential customers. If somebody isn’t even your customer yet, trying to lock them in just isn’t a good idea. When you have 100% market share, come talk to me about lock-in. Until then, if you try to lock them in now, it’s too early, and if any customer catches you in the act, you’ll just wind up locking them out. Nobody wants to switch to a product that is going to eliminate their freedom in the future.
But we see that, as their monopoly was achieved, Microsoft throttled the availability of the Office file format specifications until they was no longer available to potential competitors. The lock-in has been achieved; the door slams shut.
This shows the strategic value of file formats to Microsoft and the steps they have been willing to take in order to keep users locked onto the Windows/Office platform.
So now, today, Microsoft is pushing their Office Open XML standard, “old wine in new wine skins”, not so much a new format as a new ploy. What should enrage every thoughtful person is that they are using compatibility with the legacy binary formats as the main selling point of the OOXML format. Think about it. Compatibility with the binary format that they withdrew from the public seven years ago when they cemented their monopoly, is now being touted as their unique advantage. Said differently, Microsoft is selling OOXML as the solution to an interoperability problem that they themselves created and carefully orchestrated.
I’m obviously not a fan, as regular reads of this page already know.
So what prevents Microsoft from doing the same thing again? How do we know that the next version of Office will use a format that is an open standard? Office 2007 has already extended OOXML in undocumented ways to support things like macros and DRM. Although they cannot withdraw the OOXML specification from Ecma, they can surely just ignore it, not update it, and continue to extend their format in undocumented ways. Since the success of ODF is the only reason they are pushing OOXML, it would be in true character for them to deemphasize standard OOXML as soon as ODF is wiped out, and turn it back into an in-house proprietary format, only disclosed to those who agree not to compete with them.
The time is right for a single document standard and that standard is clearly ODF. The opportunity is here for ISO/IEC JTC1 to send a resounding message in favor of interoperability and consistency and to reject OOXML as contradicting the existing ISO ODF standard. I don’t have a lot to say here about the various technical/legal contradiction arguments behind this. (This post is too long already) If you want the details, they are covered in depth on GrokLaw and ConsortiumInfo. In particular I’d draw your attention to the two Wiki pages (here and here) mentioned in the GrokLaw piece where, if you are so inclined, you can help research and explain the technical reasons why OOXML should be rejected. I certainly plan on contributing to that effort.
I believe we can win this one. The forces of vendor lock-in and secret proprietary interfaces and formats are vulnerable. They have overplayed their cards and are pushing a specification which will only cause them embarrassment once its contents are better known. They are losing market share as well as mind share. They are the past. One last big shove and we should be able to topple the tower. All together now….Push!

