• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / 2007 / Archives for June 2007

Archives for June 2007

An ODF/OOXML File Format Timeline

2007/06/24 By Rob 31 Comments

I suppose the downside of a blog post containing only a picture is that there is nothing for anyone to quote. So here are a few themes that struck me while putting this chart together:

  1. Microsoft once made file format information on the binary formats readily available, in fact encouraged programmers to use the binary formats. But then around 1999 they reversed course, and eliminated such documentation. At the time, working at Lotus, I had no idea what motivated this change. It was only years later, when Microsoft internal memos were released in cases like Comes v. Microsoft, that the full picture emerged. The file format was viewed by Microsoft as a strategic tool, used to support the overall Microsoft platform, not the user. The format was designed to preserve their vendor lock-in. The availability of the file format documentation to competitors was limited, as a matter of corporate policy.So this reminds us that just because something is documented and available today does not prevent Microsoft from changing their mind at a later point and removing the documentation, failing to update it with new releases, or making it available only under a more restrictive license. Since Ecma owns the OOXML specification, as well as the future maintenance of it, any belief in the long-term openness of this format depends on your trust of Microsoft’s future behavior in this area.
  2. Like any durable goods monopoly (and few things are as durable as software) Microsoft’s largest competitor is their own install base. Microsoft has made many attempts at moving beyond the binary formats in the past, with Office 2000, Office XP and Office 2003. But in each case it failed. These were all false starts and abandoned attempts. So we should look for signs that OOXML is actually Microsoft’s real direction and not another false start or dead end. My guess is that OOXML is merely a transitional format, much like Windows ME was in the OS space, a temporary hybrid used to ease the transition from 16-bit to the 32-bit platform that would eventually come (Windows 2000). Microsoft doesn’t want to support all of the quirks of their legacy formats forever. That just leads to bloated, fragile code, more expensive development and support costs. They would rather have clean, structured markup, like ODF. But the question is, how do you get there? The answer is straightforward: First, eliminate the competition. Second, move users in small steps, promising the comfort of continuity and safety. Third, once you have eliminated competition and have the users on the OOXML format that no one but Microsoft fully understands, then you may have your will of them. For example, introduce a new format that drops support for legacy formats and force everyone to upgrade. They are pretty much doing this already on the Mac by dropping support for VBA in the next version of the Mac Office.Even a cursory look at OOXML shows that it was not designed for long-term use, even by Microsoft. So the question I have is, what is the real format that they are going toward?
  3. Microsoft, after pretty much ignoring document standards for over a decade, suddenly got religion in late 2005 and rushed whatever they had on hand into Ecma. Remember, just months earlier they had recommended the Office 2003 Reference Schemas to Massachusetts for official use. I’m certainly glad Massachusetts did not fall for that by putting their resources on another dead format in the Microsoft format graveyard. OOXML was not designed to be a standard. It is just a proprietary specification that Microsoft has dumped, at the last minute, into ISO’s lap, in an attempt to translate their market domination into a standards imprimatur in order to further cement their market domination. It is a win-win situation for them. Either they have a effective monopoly in office applications and an ISO standard, or they have an effective monopoly in office applications. Nice situation for them either way.
  • Tweet

Filed Under: ODF, OOXML, Popular Posts

The Value of Choice

2007/06/21 By Rob 8 Comments

Here in Westford, Massachusetts, some of our public schools have boilers that can be powered by natural gas or heating oil. This way the schools can have their choice of fuel, which they can alter year to year, or even month to month according to the comparative prices of these two commodities. Such a choice has a value, a very tangible value at any given point. For example, suppose that today the price of natural gas was $1.15/therm (100,000 BTU’s) and the price of heating oil was $1.72/therm. The value of choice is ($1.72-$1.15)*# of therms purchased. Those clever with finance could probably estimate the long-term value by pricing the analogous commodities futures.

Of course, choice here has a cost as well, namely the increased cost to purchase and maintain the more complex boiler that offers the choice of gas or oil. If the value of having the choice is worth more than the cost of maintaining a world that offers that choice, then you have a net gain by preserving the choice. Otherwise, you are losing by having choice. It is odd to hear that, isn’t it? You can lose by having choice, if the cost of maintaining that choice is greater than the benefit from having a choice.

For example, take shoe sizes. In the U.S. we buy shoes in 1/2 size increments. In theory a store could offer shoes in 1/10 size increments. This would give you, the consumer, an increase in choice, and this choice would have a distinct value to you. Whereas the previous shoe sizes would be an average of 1/4 of a size away from perfect fit, the new shoes would be on average only 1/20th of a size away. So a tangible benefit to you the consumer. But this comes at a cost, since the larger inventory and slower turnover for the retailer would increase their costs. Since we are unlikely to buy more shoes than we do today, this cost increase would be passed on to the consumer. So in this case, the benefit of better fitting shoes is not seen to be worth the increased costs to maintain those choices, so the industry remains with 1/2 size shoe increments.

As an aside, I’ll give you another example, as a brainteaser. You are walking down a street evenly lined with many stores, all of which sell some commodity, let’s say orange juice. The prices at the various stores are random. You want to buy orange juice at the best price, but you can only make one purchase, and you can only make one pass down the street. So you can look at many prices, but at some point you need to make a decision and purchase the orange juice, and you can’t turn back or make a second choice once you’ve made a purchase. The street is 1 kilometer long. Where do you buy your orange juice? Even with an abundance of choice, it isn’t always clear how you make an optimal decision. Note that many life decisions are like this, since time acts as a one-way street, where often we must make an important choice, based on the info we have so far, but with uncertain knowledge of the future, and often we can only choose once.

So what does this mean for document formats? It is popular these days to use the word “choice” as a “god term”, a phrase introduced by Richard Weaver in The Ethics of Rhetoric, referring to words like “progress”, “culture” and “for the Fatherland” that are used to appeal more by seduction than by rational argument. But we should avoid the seduction and ask ourselves what this choice really means. What is it really worth to you and your business? Sitting down today, writing a document, or creating a spreadsheet, what is the value to you, knowing that you could save a document to ODF, OOXML, UOF, SmartSuite, WordPerfect Suite format, etc.? And what is the tangible value of having that choice, that option?

What I want in a document format is:

  1. It is supported by my word processor.
  2. When I save the document and later retrieve it, the document looks and behaves the same.
  3. When I give it someone else, who may be using the same or a different word processor, on the same or a different operating system, it looks and behaves the same.
  4. It is easily processable by other software tools. I care about this directly because I am a programmer. But even if I were not, I would want this characteristic, since this is what ensures that an ecosystem of other tools will emerge to support the format, offering me more choice.
  5. I want the format to be open for the same reason, so it encourages the creation of other tools that I may later choose to use.
  6. I want the format to be controlled by a group of vendors and other interests, not dominated by a single player. Further, I’d want them to be to be working openly and transparently, so the public can all see what they are doing. We should all remember the line by Adam Smith, “People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public.” The remedy is given by Justice Louis Louis Brandeis in his line, “Sunlight is the best disinfectant.”
  7. I want the format to be well-designed according to industry best practices, since I know that will make it easier to work with for tools vendors and will help ensure its longevity as a format.

Given a single format that can accomplish these goals, I see zero value in having a second standard. In fact, having multiple formats brings increased complexity and expense to the software vendor who maintains and supports all the translator code and this expense gets passed on to the consumer. And then there is the opportunity cost of the features that may have been coded if my vendor hadn’t been distracted by writing translator code. Also, there is the cost, in performance and fidelity loss when translating between formats, and the resulting business losses that may be caused by errors introduced in this processing. This is all very real. But where is the benefit?

To solve this puzzle, we need to look at it from Microsoft’s perspective. A standard in this space is a very scary proposition for them. A comparison can be made to the early years of the automobile industry:

Between 1904 and 1908, more than 240 companies entered the fledgling automotive business. In 1910 there was a mini-recession, and many of these entrants went out of business. Parts suppliers realized that it would be much less risky to produce parts that they could sell to more than one manufacturer. Simultaneously, the smaller automobile manufacturers realized that they could enjoy some of the cost savings from economies of scale and competition if they also used standardized parts that were provided by a number of suppliers.

Guess which two players were not interested in parts standardization? The two largest companies in the industry: Ford Motor Company and General Motors. Why? Because they were well able to achieve strong economies of scale in their own operations, and had no interest in “interconnecting” with anyone else: standardization would (partially) level the playing field regarding economies of scale at the component level. As usual, then and now, standardization benefits entrants, complementors, and consumers, but may hold little interest for dominant incumbents. — Carl Shapiro and Hal R. Varian, Intro for Managing in a Modular Age

We’re in a very similar situation now. Microsoft, the sole dominant player in this market, is perfectly happy with having total control over their proprietary formats. It has worked very well for them for many years. But just as Ford and GM eventually gave in to the obvious necessity of true interoperability, Microsoft will as well. The companies that win in this world are the ones that adapt, not the ones that sell adapters.

We need to start talking about what we can do to ensure that we have a single open document format that can be used by everyone. Making a second ISO standard for document formats is a bad idea. What we need to do is continue to evolve ODF, continue the work to harmonize UOF and ODF, and also take on the task of harmonizing OOXML and ODF. The value of having a single standard in this space is clear. We just need to remain vigilant in the face of those commercial interests that would stand to lose the most if customers had true document portability and could choose platforms and applications based on features and price and support, and not solely on fears, uncertainty and doubt about whether they could still access their legacy documents.

  • Tweet

Filed Under: Economics, OOXML, Standards

No Representation Without Specification

2007/06/19 By Rob 7 Comments

Maybe I just have an ear for this, but whenever I hear a number of people saying the same odd thing, using the same strained phrase, it catches my attention and makes me take a closer look. Individuals naturally have a great diversity of expression and phrasing, so where this is lacking, and the Borg starts speaking as one, it is good to pay it some heed.

The word for today is “represents”. A few exemplary quotations to demonstrate a particular pattern of use that attracted my attention:

From Microsoft’s Open XML Community:

Open XML was designed to provide users the benefits of: faithfully representing in an open format existing office documents, interoperability, support across platforms and applications, integration with business data, internationalization, support for accessibility and assistive technologies, and long-term document preservation.

Microsoft’s Jean Paoli as quoted by Tim Anderson:

As a design goal, we said that those formats have to represent all the information that enables high-fidelity migration from the binary formats.

And Paoli again in a Microsoft press release:

So the Office Open XML file formats represent all the characteristics of the Office binary file formats, while making it easier for people to connect to the different islands of data in the enterprise.

Microsoft’s Brian Jones in a comment response on his blog:

We had to leave some legacy behaviors in place because the goal of our work was to create an XML format that could represent our existing base of Office documents.

From the OOXML Overview whitepaper [pdf] presented to JTC1:

OpenXML was designed from the start to be capable of faithfully representing the pre-existing corpus of word-processing documents, presentations, and spreadsheets that are encoded in binary formats defined by Microsoft Corporation.

From Ecma’s response to the JTC1 NB contradiction objections:

OpenXML has been designed to be capable of faithfully representing the majority of existing office documents in form and functionality.

Microsoft’s Stephen McGibbon:

I represent Microsoft at all kinds of meetings and my firm understanding is that one of the things that differentiates OpenXML and ODF is OpenXML’s ability to faithfully represent all of the previously created Microsoft Office binary format documents.

So what are we to make of this? They are being very specific about their choice of words, aren’t they? I wonder why…

A file format represents data. It stores data. It encodes data. These are all synonymous. But the ability to represent data is a trivial thing to do. For example, here an example of a markup language that can also represent all legacy Microsoft Office documents:

<office-document>
<one/>
<one/>
<zero/>
<one/>
<zero/>
<zero/>
</office-document>

Since the above markup directly maps to binary, it can faithfully represent 100% of existing Office documents with 100% backwards compatibility. It can also represent perfectly the documents of every other vendor, past, present and future.

But before Ecma gets all excited that they may soon have another standard to Fast Track, I must admit the obvious. This markup is not all that useful as an interoperable document format. Why? Because although it can represent 100% of legacy documents, it does not specify how to do anything with them. Except at the level of a bit, the format does not express any structure or semantics. Although you can express anything you want with 1,’s and 0’s, there is no common, interoperable use above the level of 1’s and 0’s provided for. My binary document means something only to me, and unless I go outside of the standard and share additional information with you, you will not be able to understand my binary document.

Interoperability comes not from representation, but from specification.

(An aside — There is however speculation that it is possible to transmit information via a binary code in a way that presupposes no other prior agreement or knowledge other than universals like mathematical and physical laws. It would require a bootstrapping approach where very basic elements of notation and mathematical logic are transmitted, followed by increasingly more complex concepts. By this theory it would be possible to communicate with alien intelligences without any prior conventions. See, for example, Carl Sagan’s novel, Contact. But this is probably overkill for an office document format, unless your workplace is a lot stranger than mine.)

So what is the difference between representing and specifying? When you represent, it means that you can map from the features of the legacy format to the the new format. When you specify, it means that you provide the map, and enough detail so that others can read and write that same representation. That is a big difference.

Of course, OOXML is more than 1’s and 0’s. But when you see attributes with names like, “useWord97LineBreakRules,” with no additional specification, then you know that the fix is in. My guess is that MS Word has code someplace that looks like this:


if (useWord97LineBreakRules)
doCrappyOldWayOfLineBreaking(); // reuse legacy code from Word 97
else
doNewWayOfLineBreaking(); // Use new rules

If this is true, then MS Word can implement this feature trivially. But no one else can make sense of it, because we lack a specification of its behavior . They might has well had called the attribute, “Fred.” It is just as useful.

Another example is how OOXML deals with PowerPoint slide transitions, the things that people use in an attempt to make a boring presentation seem more interesting. Microsoft has ensured that they can represent all of the transitions. They are all there listed in Section 4.4.1.46: blinds, checker, circle, comb, cover, cut, etc. But when you drill down into the definitions, this is what you find:

wheel (Wheel Slide Transition)

This element describes a wheel slide transition effect.

[Example: Consider we have a slide with a wheel slide transition. The <wheel> element should be used as follows:

<p:transition>
<p:wheel/>
</p:transition>
End example]

That’s it. Ditto for all of the other slide transitions. Not exactly specified fully, is it? Although the text claims that it “describes a wheel slide transition effect,” in truth it merely labels it. There is no specification, only representation. And that curious little example — is this some sort of joke? Did someone really think that attributes with no definition are improved by trivial examples? It reminds me of the old spelling bee joke:

Judge: The word is “synecdoche.”
Student: Could you use that in a sentence?
Judge: Certainly. “Synecdoche” is a very hard word to spell.

100% correct, but also 100% useless. As I read through the OOXML specification I am finding hundreds of places like this where things are labeled, but no definition is given.

So I think we need to ask more questions when we hear the claims that OOXML was designed to faithfully represent 100% of the legacy documents. We need to respond that representation is not enough for an open format. Even an XML format of just <one>’s and <zero>’s can do that. To be of use to anyone other than Microsoft we need more than just representation. We need specification, and we need the map to the legacy formats. To accept anything else is to embark on a voyage with a foreign dictionary missing the definitions. It can represent everything that you want to say, but you’ll be unable to say any of it.

ISO defines a standard as a:

…document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context

A key clause there is the requirement for providing, “common and repeated use.” Providing explicit representation for a single vendor’s legacy formats while not providing for common use of that ability, this is not the purpose of an ISO standard and to my eyes appears to be an abuse of the standardization process.

  • Tweet

Filed Under: OOXML

Hemidemisemiquavers

2007/06/11 By Rob 9 Comments

Some “short notes” to share with you:

From a GrokLaw news pick we hear that ZDNet’s David Berlind recently interviewed Tim Berners-Lee in Boston, where Sir Tim received the Massachusetts Innovation and Technology Exchange’s Lifetime Achievement Award. Watch the whole interview if you have 12 minutes, though I will transcribe one passage which highlights the importance of agreeing on a single open standard for a problem domain and fostering competition among the applications built upon that standard:

It was the standardization around HTML that allowed the web to take off. It was not only the fact that it is standard, but the fact that its open and the fact that it is royalty-free.

So what we saw on top of the web was a huge diversity and different business which are built on top of the web given that it is an open platform.

If HTML had not been free, if it had been proprietary technology, then there would have been the business of actually selling HTML and the competing JTML, LTML, MTML products. Because we would”t have had the open platform, we would have had competition for these various different browser platforms, but we wouldn’t have had the web. We wouldn’t have had everything growing on top of it.

So I think it very important that as we move on to new spaces … we must keep the same openness we that had before. We must keep an open internet platform, keep the standards for the presentation languages common and royalty free. So that means, yes, we need standards, because the money, the excitement is not competing over the technology at that level. The excitement is in the businesses and the applications that you built on top of the web platform.

Well said. I tried to make a similar point, but with pictures, back in February.

I recently ordered some podcasting equipment. It should arrive tomorrow. I will be looking for people to interview soon. So hide while you can, don’t answer the phone, and if it looks like I’m carrying a microphone, then run for the exit.

An interesting article in the American Surveyor, by Joel Leininger, on the importance of file format standards. Although it is a different application domain, the concerns are very similar (via OpenMalaysia).

Anyone know Romanian? Something gives me the impression that this guy from Microsoft Romania is not complementing me. I wonder what subtle hint gives me that impression…

The OOXML ballot marches on in national standards committees around the world. September 2nd is the deadline, though many committees have earlier deadlines for developing their recommendations. In the US the committee looking at OOXML is called INCITS V1, and we have until July 13th. V1 has had a few meetings so far and we’re just starting to get into the technical comments. Since we have a consensus process, all it takes is a small minority of members to bring everything to a halt, which is pretty much what is happening. For example, we spent 2 1/2 hours today and discussed only two comments. So we risk having a perfunctory technical review of OOXML. When I compare this to the BSI’s excellent work developing detailed comments on a publicly-readable wiki, I think we in the US should be ashamed at the stonewalling going on in V1.

I’ll be hosting a V1 face-to-face meeting in a couple weeks in Washington, DC. Hopefully we’ll make some more substantial progress there. If you really want to follow our work closely, you can read through our mailing list archives which Sun’s Jon Bosak was kind enough to set up for us.

Although no formal call for public comments has gone out, we’ve received a number of unsolicited pro-OOXML letters which you can read here. As you can see, they are pretty much identical form letters, all ending with the artless phrase, “Furthermore, Open XML in no way contradicts any other international document standard.” Remind anyone of the Manchurian Candidate’s, “Raymond Shaw is the kindest, bravest, warmest, most wonderful human being I’ve ever known in my life”?

In any case, if you want to provide input into this process, feel free to send in your thoughts as well. Having read many of these letters myself, I’d offer the following advice:

  1. Don’t send in a form letter. It hurts your cause more than helps it, since it makes it look like you couldn’t get real support if you tried.
  2. Use your real name and email address and postal address, so we know you are a real person and not a robot.
  3. Be polite. Remember you are trying to persuade.
  4. Give a succinct, reasoned opinion. Keep it to a page if you can.
  5. Ask for a specific action. Don’t expect the reader to draw a conclusion. Draw it yourself.

Of course, since V1 is developing the US position on OOXML, comments from US companies and citizens are especially welcome. Also, if you have specific technical comments about OOXML, you can submit them through me and, if I agree with your points, I will raise them directly with the committee. (I do this as a personal favor to you, my readers, not as an official INCITS V1 solicitation.) Assume the committee is already familiar with the GrokLaw items. But OOXML is a big standard, and there are certainly dark corners where I have not ventured. So if you’ve found something new, certainly let me know.

Canada continues to solicit comments on OOXML. And the UK is soliciting comments as well, through June 30th. Again, be succinct, and give your name and address. Otherwise you risk having a committee member reject your comment outright since it cannot be ascertained whether you are actually a resident of that country.

A blog I’d like to recommend to my readers is Lodahl’s blog. Leif Lodahl has been giving some great coverage of ODF happenings in Denmark, including analysis of the parliamentary debate on the question of whether Denmark should have one or two standards. Also a good catch of Microsoft dancing all over the place, trying to avoid giving a straight answer on why Word does not provide integrated ODF capabilities. If you can spare 45 minutes this is a great clip to listen to.

  • Tweet

Filed Under: ODF, OOXML, Standards

Documents for the Long Term

2007/06/05 By Rob 6 Comments

We all will die. Institutions come and go. Empires and nations crumble. But what is written down may have transcendent longevity. Whether it is a personal letter from a departed friend, the minutia of administration or the recorded contemporary reports of great historical events, the durable written word has almost mythic status in our culture.

The permanence of the written word has fascinated mankind for millennia. The powerful knew the truth of this. To be sure that his deeds would outlive his contemporaries, the Emperor Augustus had his CV engraved in bronze in his “Res Gestae Divi Augusti” (Deeds accomplished of the Divine Augustus). The bronze did not survive, but the words have. Horace wrote in his Ode, “Exegi monumentum aere perennius” (I have erected a monument more lasting than brass). And his words have survived. Shakespeare in Sonnet #55 echoed this sentiment, “Not marble, nor the gilded monuments/ Of princes shall outlive this powerful rhyme”. Shelly in his Ozymandias shows the irony of the surviving boastful inscription, “Look on my Works ye Mighty, and despair!” beside the “colossal wreck” of an ancient monument.

The saying is “ars longa, vita brevis” — art is long, but life is short. But this is not entirely accurate. The performing arts such as dance or music have a very sketchy and imperfect history until the rather recent invention of written notations. So dance before around 1450 is a matter of speculation. No doubt the ancient Bacchae accompanied their ecstatic revels with an equally furious dance. But we know none of it. Thucidydes has the Lacedamonians march into battle to the accompaniment of flutes. What martial notes they played we do not know. We can only speculate, with Thomas Browne, “What song the Syrens sang”. Some like Benjamin Bagby may give a glimpse at earlier performance practice. And scholars like Milman Parry find echoes of ancient practices in traditional story telling. But we cannot know for certain.

The structural arts of architecture, city design, aqueducts, and monuments, engravings, these have all fared better over time. Even scattered texts from antiquity have survived. Text can have longevity, but not unassisted. Left to the ravages of water, fire, insects and fungi, papyrus, vellum and paper will only survive a few hundred years. For a text to survive longer, someone must copy it. So, the works of Cicero, these we have in rather good shape today, in part because Augustine of Hippo praised his works. (Then as now, getting a good review from a recognized figure is is the best marketing).

Which ancient texts were copied, and thus became part of the canon of western literature, was somewhat a matter of chance. Nine of the surviving plays of Euripides, existing in a single partial manuscript, are curiously in alphabetical order, but only containing plays beginning with the Greek letters eta through kappa, leading scholars to believe that this is merely volume 2 of a larger collection of plays that are lost. Euripides is believed to have written almost 100 plays. We have almost 20 of them today.

With digital documents, the issues are a little different. The transmission of digital data can be done without error. But digital media, the tapes, floppies and optical disks, these are susceptible to the ravages of time, light, heat, fungi and the gradual deterioration of the substrate. So, digital documents must be copied from one storage format to another every few years. And so modern digital data relies on the same haphazard selection mechanism as we see with ancient texts — survival depends on someone deciding that a document is worthy of copying and preserving.

That said, the survival of a document does not depend entirely on the whims of monks or archivists. There are certain engineering principles which are key to creating a document that lends itself to long term retention. Some of these are tasks for the individual authors:

  1. Keep a document intact. Better to preserve a document inclusive of annexes and appendicies.
  2. Separation of content, structure, layout and presentation
  3. Findability — a good title, a abstract, keywords and other metadata will help ensure that your document can be found and retrieved via current and future search technologies.
  4. Use of a fully-specified, open document format.

From another angle we can look at archiving from a systems view and follow a basic architectural principle. The key to durability, whether in documents, monuments, institutions, or whatever, all boils down to this: Do not depend on something less stable than yourself.

(I didn’t invent that principle, but don’t recall where I first heard it. Any idea who it was?)

If you depend on something less stable, which is to say more susceptible to change, than yourself, then when it changes, it forces you to change. Stability is when you change only when you want to change.

For example, a house is built on a foundation. A frame, plumbing and electrical, walls, wallpaper and furniture are layered on top. If replacing the wallpaper triggered a need for a new foundation, then we would say that the house was inherently unstable. But it is reasonable to expect that installing new plumbing will require opening a hole in a wall and later applying wallpaper. The expected rates of change of these various layers has lead to a method of construction that enforces this dependency chain. If for some reason we needed to make very frequent changes to the plumbing, then we would place them outside the interior walls, or behind removable wall panels for each access.

We carefully manage dependency chains when programming as well. For example, imagine a module A (a database client) that depends on a module B (a database server) where you believe that module B is less stable (has a greater rate of change) than A. This is a problem, since changes to B trigger changes to A. So we define a new interface layer C (maybe SQL) that is more stable than A or B. By having A depend on C rather than B directly, we transform the unstable dependency A->B, into the stable relationship (A,B)->C, where C is a standard.

This same principle applies to document formats as well. Never depend on something less stable than yourself. For the first few decades of document formats, the era of binary formats in the 1980’s and early 1990’s, we did this all wrong, as the following diagram shows:

In those days the file format stood atop a large set of dependencies and changes at all layers would lead to changes in the file formats. This created a very inflexible stack of dependencies, where changes in the less stable lower layers can trigger incompatible changes to the document format. When we see that an Excel file on the Mac has a different internal date format than an Excel file created on Windows, we’re are seeing remnants of this kind of dependency chain.

Note also that these interfaces between the layers were not standards, but proprietary interfaces. For example, a Word 95 document might be seen as this:

The move to XML-based file formats changes this diagram but little. The format at the top is now XML but the dependency chains are the same. The relationship of the format to the technology stack has not changed:


If using a new document format requires you to buy a new application suite, update your hardware and buy a new operating system, then that should be a clear sign that something is wrong. “The tail wags the dog,” as they say.

And note that a dependency is not the same as a layer. You can pretty things up all you want with the use of standards like XML, but still have adverse dependency chains. Taking a Microsoft Word binary format and translating it into XML, and putting it in a Technical Committee whose charter requires that it remain 100% compatible with Microsoft Word leaves you will a file format that depends on Microsoft Word, no matter now much XML Schema and Dublin Core you throw at it. The XML is just syntactic sugar. But the essence of the dependency chain remains: OOXML depends on Word and Windows, a single vendor’s application stack. Instead of an application supporting a format, a format is supporting an application.

I should further note that a vendor, at great expense and effort, can forestall the bad effects of an unstable dependency chain, sometimes for many years. Instability, with effort, can be managed, as jugglers, unicyclists and stilt walkers remind us. Even though the Word binary format has many dependencies on the Windows platform, and on specific internals of Word and features and behaviors from earlier versions of Word, Microsoft has managed to preserve some level of compatibility with these older formats, even in current versions of Word. The support is far from perfect, and it certainly makes their file format and their applications more complicated and more expensive to work with. But that is the burden they face from bad engineering decisions back in the early 1990’s. They and their customers live with that, and though they may not realize it, they all pay a price for it.

The alternate approach, the one that leads to better prospects for long term document access, is to have a stack, not of proprietary applications and interfaces, but of standards. ODF’s long-term stability and readability comes from the fact that it is built upon, and depends upon other standards that are widely-used, widely-adopted and widely-deployed. ODF is designed so the format depends on things more stable than itself, with a solid foundation as seen here:

The suitability of a format for long term archiving depends as much on the formal structure of the technological dependencies as it does on specific details of the technologies involved. The greatest technologies in the world, if assembled in an unstable dependency arrangement, will lead to an unstable system. Look at the details, certainly, but also step back and look at the big picture. What technology changes can render your documents obsolete? And who controls those technologies? And what economic incentives do they have to trigger a cascade of changes every 5 years, to force upgrades? As consumers and procurers we all need to make a decision as to whether we want to ride on that roller-coaster again.

The question we face today is whether we want to carry forward the mistakes of the past and the extensive and expensive logic required to maintain this inherently unstable duct tape and bailing wire Office format, or whether we move forward to an engineered format that takes into account the best practices in XML design, reuses existing international standards, and is built upon a framework of dependencies that ensures that the format is not hostage to a chain of technologies that can be manipulated by a single vendor for their sole commercial advantage.

  • Tweet

Filed Under: ODF, OOXML, Standards

Primary Sidebar

Copyright © 2006-2023 Rob Weir · Site Policies