Here in Westford, Massachusetts, some of our public schools have boilers that can be powered by natural gas or heating oil. This way the schools can have their choice of fuel, which they can alter year to year, or even month to month according to the comparative prices of these two commodities. Such a choice has a value, a very tangible value at any given point. For example, suppose that today the price of natural gas was $1.15/therm (100,000 BTU’s) and the price of heating oil was $1.72/therm. The value of choice is ($1.72-$1.15)*# of therms purchased. Those clever with finance could probably estimate the long-term value by pricing the analogous commodities futures.
Of course, choice here has a cost as well, namely the increased cost to purchase and maintain the more complex boiler that offers the choice of gas or oil. If the value of having the choice is worth more than the cost of maintaining a world that offers that choice, then you have a net gain by preserving the choice. Otherwise, you are losing by having choice. It is odd to hear that, isn’t it? You can lose by having choice, if the cost of maintaining that choice is greater than the benefit from having a choice.
For example, take shoe sizes. In the U.S. we buy shoes in 1/2 size increments. In theory a store could offer shoes in 1/10 size increments. This would give you, the consumer, an increase in choice, and this choice would have a distinct value to you. Whereas the previous shoe sizes would be an average of 1/4 of a size away from perfect fit, the new shoes would be on average only 1/20th of a size away. So a tangible benefit to you the consumer. But this comes at a cost, since the larger inventory and slower turnover for the retailer would increase their costs. Since we are unlikely to buy more shoes than we do today, this cost increase would be passed on to the consumer. So in this case, the benefit of better fitting shoes is not seen to be worth the increased costs to maintain those choices, so the industry remains with 1/2 size shoe increments.
As an aside, I’ll give you another example, as a brainteaser. You are walking down a street evenly lined with many stores, all of which sell some commodity, let’s say orange juice. The prices at the various stores are random. You want to buy orange juice at the best price, but you can only make one purchase, and you can only make one pass down the street. So you can look at many prices, but at some point you need to make a decision and purchase the orange juice, and you can’t turn back or make a second choice once you’ve made a purchase. The street is 1 kilometer long. Where do you buy your orange juice? Even with an abundance of choice, it isn’t always clear how you make an optimal decision. Note that many life decisions are like this, since time acts as a one-way street, where often we must make an important choice, based on the info we have so far, but with uncertain knowledge of the future, and often we can only choose once.
So what does this mean for document formats? It is popular these days to use the word “choice” as a “god term”, a phrase introduced by Richard Weaver in The Ethics of Rhetoric, referring to words like “progress”, “culture” and “for the Fatherland” that are used to appeal more by seduction than by rational argument. But we should avoid the seduction and ask ourselves what this choice really means. What is it really worth to you and your business? Sitting down today, writing a document, or creating a spreadsheet, what is the value to you, knowing that you could save a document to ODF, OOXML, UOF, SmartSuite, WordPerfect Suite format, etc.? And what is the tangible value of having that choice, that option?
What I want in a document format is:
- It is supported by my word processor.
- When I save the document and later retrieve it, the document looks and behaves the same.
- When I give it someone else, who may be using the same or a different word processor, on the same or a different operating system, it looks and behaves the same.
- It is easily processable by other software tools. I care about this directly because I am a programmer. But even if I were not, I would want this characteristic, since this is what ensures that an ecosystem of other tools will emerge to support the format, offering me more choice.
- I want the format to be open for the same reason, so it encourages the creation of other tools that I may later choose to use.
- I want the format to be controlled by a group of vendors and other interests, not dominated by a single player. Further, I’d want them to be to be working openly and transparently, so the public can all see what they are doing. We should all remember the line by Adam Smith, “People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public.” The remedy is given by Justice Louis Louis Brandeis in his line, “Sunlight is the best disinfectant.”
- I want the format to be well-designed according to industry best practices, since I know that will make it easier to work with for tools vendors and will help ensure its longevity as a format.
Given a single format that can accomplish these goals, I see zero value in having a second standard. In fact, having multiple formats brings increased complexity and expense to the software vendor who maintains and supports all the translator code and this expense gets passed on to the consumer. And then there is the opportunity cost of the features that may have been coded if my vendor hadn’t been distracted by writing translator code. Also, there is the cost, in performance and fidelity loss when translating between formats, and the resulting business losses that may be caused by errors introduced in this processing. This is all very real. But where is the benefit?
To solve this puzzle, we need to look at it from Microsoft’s perspective. A standard in this space is a very scary proposition for them. A comparison can be made to the early years of the automobile industry:
Between 1904 and 1908, more than 240 companies entered the fledgling automotive business. In 1910 there was a mini-recession, and many of these entrants went out of business. Parts suppliers realized that it would be much less risky to produce parts that they could sell to more than one manufacturer. Simultaneously, the smaller automobile manufacturers realized that they could enjoy some of the cost savings from economies of scale and competition if they also used standardized parts that were provided by a number of suppliers.
Guess which two players were not interested in parts standardization? The two largest companies in the industry: Ford Motor Company and General Motors. Why? Because they were well able to achieve strong economies of scale in their own operations, and had no interest in “interconnecting” with anyone else: standardization would (partially) level the playing field regarding economies of scale at the component level. As usual, then and now, standardization benefits entrants, complementors, and consumers, but may hold little interest for dominant incumbents. — Carl Shapiro and Hal R. Varian, Intro for Managing in a Modular Age
We’re in a very similar situation now. Microsoft, the sole dominant player in this market, is perfectly happy with having total control over their proprietary formats. It has worked very well for them for many years. But just as Ford and GM eventually gave in to the obvious necessity of true interoperability, Microsoft will as well. The companies that win in this world are the ones that adapt, not the ones that sell adapters.
We need to start talking about what we can do to ensure that we have a single open document format that can be used by everyone. Making a second ISO standard for document formats is a bad idea. What we need to do is continue to evolve ODF, continue the work to harmonize UOF and ODF, and also take on the task of harmonizing OOXML and ODF. The value of having a single standard in this space is clear. We just need to remain vigilant in the face of those commercial interests that would stand to lose the most if customers had true document portability and could choose platforms and applications based on features and price and support, and not solely on fears, uncertainty and doubt about whether they could still access their legacy documents.
Caveat: I’ve never looked at the OOXML or ODF standards.
Someone needs to point out to Microsoft that “legacy document support” doesn’t belong in a standard. Legacy documents should be moved into a new file using the new standard format which then allows the document to be preserved.
If you want the document to look the same as it did before conversion, use appropriate tags and attributes in the new standard. It makes zero sense to have a “xxxAsWordYY” tag. Any such tag should be converted into an existing tag with appropriate attributes.
Maybe it is just too obvious for me.
The way OOXML is able to represent the legacy formats is to include them as special cases in the specification. Even though Microsoft tells everyone that interoperability with other formats can be achieved by translators, they don’t even completely translate their own legacy files when they convert to OOXML. So old VML is left as VML and not converted to DrawingML.
This is done purely as a cost-saving measure by Microsoft. It is entirely the wrong thing to do technically, but it saves them the expense of writing more code to translate the document into the new format. Of course, at the same time it makes OOXML more expensive for anyone else to work with, since you now need to support both VML and DrawingML.
I was wondering the other day about the “legacy” rhetoric. It all sounds great until you realise that you need Office 2007 as its the only convertor to OOXML. And then I wondered how muich access to old documents is needed as apposed to new documents created every day?
Someone had raised the issue at a meeting on OOXML vs ODF that they needed OOXML to get old documents on computers for legal reasons i.e. as expert witnesses. So I wondered what is wrong with OpenOffice.org or any word processeor that has reverse engineered the .doc formats. Or what in fact is wrong with keeping copies of all word processors running on the correct OS in a virtual machine. Or better yet automating that and printing them from their original word processor into PDF which is an ISO standard.
Why I think this legacy thing is bogus is that we’ve created documents for centuries. We can still read many of those. Then in the early 90s vendors lock their formats. So we have about 15 years of proprietary formats. The OOXML spec talks about accesing millions of legacy documents. So I’ve been wondering what exactly is the volume that actually needs to be accessed. And what is the growth rate of document creation. That would certainly help us understand how much more important future interoperability is over legacy. And I’m pretty sure the numbers would show that the 15 wasted years will soon be nothing.
Today I filled in forms in a bank on paper and they’ll get filed. So we still have tonnes of that way of document storage. Things will only grow exponentially.
For a non-specialist view on the Danish situation see here:
Copenhagen Post
It’s interesting to the the misunderstandings that are shown in the article and show the public confusion about this debate.
Actually, there are convertors, in the form of a “Compatibility Pack” from Microsoft for some older versions of Office. But your point is well taken that there is more than one way to treat legacy documents.
1) You could proactively convert them to PDF for archiving.
2) You could convert them to PDF on demand, when requested.
3) You could preserve your legacy applications, perhaps in a virtualized image, so you can run it concurrently with your existing environmnet. This ensures that you will have the truest fidelity, since you can actually run Word 95 on Windows 95, for example.
If fidelity of legacy documents is a big concern, then it makes sense to think of this as preservation task, like an archaeologist or museum curator. The first duty is not to harm the artifact. You are trying to preserve it, not introduce errors or data loss by converting it to the new OOXML. For forensic purposes, even the smallest things like date stamps on the files would be changed by conversion. How can you vouch for something as evidence if it has been converted?
The Copenhagen Post article shows the same confusion we’ve seen in other places: mixing up open standards and open source and confusing formats for applications.
In relation to the Copenhagen Post, it’s perhaps worthy of comment that MS Office 2k7 doesn’t run on an Operating System that is increasingly being taking up for office productivity use across the world? Eg, Linux? Let alone potential server back ends such as Solaris and AIX?
That is something that should perhaps be mentioned to the Danish bean-counters.
Of course, as far as Microsoft goes, it is historically true to say that Microsoft has agitated for consumer choice for as long as it was the under-dog, and not a moment longer.
It is a sad, sad, day, heh heh. I find myself in the unusual position of not agreeing with your analogies.
First off, assuming an somewhat even distribution of foot sizes, and 1/2 size increments, the average misfit would be 1/8 size, not 1/4 size. In the US, I believe it’s 1/3 inch/size. 1/24 inch is likely (I’m guessing based on my own feet) *well* within the typical size difference between the two feet.
The brainteaser analogy is easily answered if you give enough information to do so at all, and choose criteria for when to make the buy. e.g. an even distribution within a certain range and a probability of getting a lower price.
“So what does this mean for document formats?”
The shoe example means that it’s close enough the vast majority of the time. You “see zero value in having a second standard.” I see value in things like TeX. But just as you could pay somebody to make custom shoes, you’re only going to do that if you really need an exact fit.
The brainteaser’s lesson you seem to ignore. You say sometimes we can only choose once, but then conclude that eventually the dominant players have to give in to standardization.
The lesson I draw from the brainteaser is that if your design (i.e. initial question) is vague and ill-defined, you end with a mess of a format. Good design is important for making future decisions and future maintenance.
If you want a good analogy to illustrate your final point… I’m sorry, I got nuthin’
My assumption on shoe sizes is that in the case of a mismatch, you would always pick a size larger than your ideal size, since a shoe a little too small is obviously far worse than one a little too large. So that gives you the 1/4-size misfit.
The brainteaser is simply a math problem. Well maybe not simple. No intent for it to continue the analogy. I think you have all the info you need. But feel free to assume a uniform distribution of prices and shop location, or a normal distribution, or any other symmetrical distribution. I think the answer is the same either way.
Actually, I once used this brainteaser to convince an economics friend of mine to stop dating and get married to the women he we currently seeing. So I guess it can be used in some contexts.