ODF

The value of restricting choice

2010/07/27 By Rob 8 Comments

The language game

Microsoft’s talking points go something like this (summarized in my words):

If you adopt ODF instead of OOXML then you “restrict choice”. Why would you want to do that? You’re in favor of openness and competition, right? So naturally, you should favor choice.

You can see a hundreds of variations on this theme, in Microsoft press releases, whitepapers, in press articles and blogged by astroturfers, by searching Google for “ODF restrict choice“.

This argument is quite effective, since it is plausible at first glance, and takes more than 15 seconds to refute. But the argument in the end fails by taking a very superficial view of “choice”, relying merely on the positive allure of its name, essentially using it as a talisman. But “choice” is more than just a pretty word. It means something. And if we dig a little deeper, at what the value of choice really is, the Microsoft argument falls apart.

So let’s make an attempt to show how can one be in favor of choice, but also be in favor of eliminating choice. Let’s resolve the paradox. Personally I think this argument is too long, but maybe it will prompt someone to formulate it in a briefer form.

Choice — the option to act

Choice is the option to act on one more possibilities. Choice is the freedom to take one path or another. Choice is the ability to open one door or another. And what is the value of choice? It depends on the value of the underlying possibilities.

In some cases, the value of choice can be valued quite precisely.

For example, imagine I have three boxes, one containing nothing, one containing $5 and another containing $10. If you have no choice, and are given one box at random, then you will get $5 on average. And if given the choice of which box to pick, also without knowing the contents, you will also get $5 on average.

Similarly, if each box contained exactly $5 and you could see inside, the value of choice would still be zero.

But if the three boxes contained nothing, $5 and $10 and you could see inside, then the value of having a choice is clear. You would naturally pick the $10 box. So having a choice is worth an additional $5.

So we see that for choice to have value, you must have two things:

A way to estimate the value of outcome over another.
A preference for one outcome over another

In some cases this can be done with precision. In other cases it can only be estimated or modeled. For example, trading stock options is essentially the selling and buying of the right to exercise the choice (option) to buy or sell a security at a given price within a given time period. The value of this choice can be modeled by sophisticated mathematical models like the Black-Scholes option pricing formula.

Eliminating choice

So going back to the boxes again. Now imagine one has $10 in it, and the other has a note in it that requires that you pay me $10. You can see the contents of each box. Which one do you choose? It should be obvious, you pick the one with $10 in it.

But what if I say you are not limited to picking only one box. You can pick either box, or both boxes if you wish. You have absolute freedom to choose A, B or A+B. What do you do? Of course, you still pick the box with $10 in it.

But doesn’t that eliminate choice? Yes, of course it did. But the value of choice was only derived from the value of the underlying outcomes. By choosing, I’ve derived the full value of having a choice. Since if one choice is clearly more favorable than others (it “dominates” the others) then the alternatives should be discarded.

Resolving the paradox of the choice

Give the choice of A, B or A+B, each are distinct, mutually exclusive choices. They are the three boxes with three outcomes. Each one has a value that could be estimated. When someone portrays option A+B as preserving choice, they are forgetting that this is a choice that also restricts choice, since it eliminates A or B in their exclusive, pure forms from consideration. Any choice, even the choice of A+B, restricts choice. If you choose A+B then you have not chosen A alone or B alone. You have the value of the outcome A+B, but do not have the possibly greater benefits of picking choice A alone or choice B alone.

Clear? I think this should be obvious, but I’ve seen these concepts cause much confusion.

It is also important to realize that the combination A+B may have conjoint effects, which may be neutral, synergistic or antagonistic. In other words the value of A+B is not necessarily the same as the value of A plus the value of B.

In some cases, certainly, the value of the A+B choice is the same as the sum of each individual values. For example, the boxes with money and notes, these are all simply additive, with no conjoint effects.

But in other cases, the value of A+B has synergistic effects. For example, the choice of diet+exercise is more salubrious that either one chosen in isolation.

And in some cases the value of A+B is less than the value either one in isolation, as anyone who has bought both a cat and a dog knows. These choices are antagonistic.

So back to the file format debate. The choice here is between adopting ODF, OOXML, or ODF+OOXML. These three choice are mutually exclusive. They are the three boxes, with three different outcomes. Each outcome has a value that could be estimated. But we should not fall into the trap of thinking that an ODF+OOXML decision is preserving choice. Far from it. By making that choice, one eliminates the possibility of having only ODF, or of having only OOXML, with the resulting values that those choices would bring. Choosing both formats eliminates outcomes and restricts choice just has much as choosing only ODF eliminates outcomes.

You cannot avoid eliminating the outcomes you do not choose. There are benefits that would come from having only a single standard, and there are costs and complications from maintaining multiple standards. These must all be considered.

ODF 1.2 Begins Final 60-day Public Review

2010/07/13 By Rob 10 Comments

A major milestone was reached for the OASIS Open Document Format (ODF) TC last week. The latest Committee Draft of ODF 1.2 (CD 05) was sent out for a 60-day public review.

As you may recall, ODF 1.2 is a single standard in three parts:

Part 1 specifies the core schema, and was send out for public review in January.
Part 2 is OpenFormula (spreadsheet formulas)
Part 3 defines the packaging model of ODF, and went out for public review back in November

The current public review is the first complete review, presenting all three parts of ODF 1.2, including the new Part 2, OpenFormula, which is our spreadsheet formula language.

We will accept public comments (and that includes comments from technical experts in ISO/IEC JTC1/SC34) through September 6th. Comments should be submitted via the TC’s public comment list, which you can join via these instructions. You can monitor incoming comments also by subscribing to the comment list, by searching the archives or unofficially via the ODFJIRA Twitter feed.

The OASIS ODF TC will track and review all received comments and produce a report indicating how we have resolved each comment. If we decide to make substantive changes to the specification based on comments received then we would approve such changes in a Committee Draft (CD 06) and send that out for a 15-day public review of the changes made. I expect this will occur. Then, the TC may vote to approve the public review draft as a Committee Specification. Then we can have a ballot of the OASIS membership to approve it as an OASIS Standard. And finally (after some additional administrative paperwork) we can submit ODF 1.2 ISO/IEC JTC1 according to their PAS process.

I think we can finish up the above remaining formal steps in the 4th quarter.

As I mentioned, the biggest difference in CD 05 over previous Open Document Format public review drafts is the inclusion of the OpenFormula specification. If you are interested in contributing comments during the public review, I’d especially encourage you to review this document. The other parts have already gone through one or more cycles of public review. This part has not.

An outline of the contents of OpenFormula is:

1 Introduction
2 Expressions and Evaluators
3 Formula Processing Model
4 Types
5 Expression Syntax
6 Standard Operators and Functions
6.4 Standard Operators
6.5 Matrix Functions
6.6 Bit operation functions
6.7 Byte-position text functions
6.8 Complex Number Functions
6.9 Database Functions
6.10 Date and Time Functions
6.11 External Access Functions
6.12 Financial Functions
6.13 Information Functions
6.14 Lookup Functions
6.15 Logical Functions
6.16 Mathematical Functions
6.17 Rounding Functions
6.18 Statistical Functions
6.19 Number Representation Conversion Functions
6.20 Text Functions
7 Other Capabilities
8 Non-portable Features

The ideal reviewer for OpenFormula would have expertise either in formal descriptions of computer languages, e.g., know EBNF, type systems, numeric computing models, etc., or knowledge of one or more of the domains of knowledge we cover via the spreadsheet functions. Honestly, I think we have enough “language lawyers” on the TC already, so I’m not so worried about that part. And we did have direct participation by experts in some functional domains. For example, the statistical and mathematical functions have been given a good scrub already by “Dr. G.”

However, the financial functions, these I think could use a thorough review by a subject matter expert, ideally an expert in financial accounting standards, actuarial sciences, or similar. If anyone knows such an expert who is willing to contribute comments on approximately 30 pages of function definitions related to loan amortization, bond coupon and yield, rates of return, day count conventions, etc., please let me know via email.

Note finally that although OpenFormula is part of the ODF 1.2 specification, it was designed to be a portable, embeddable expression language syntax. It is a natural fit for a spreadsheet application, but it could be used wherever you need to encode a calculable expression with a rich library of domain-specific functions. It was designed so it could be used in other contexts.

I think it would be a fun project to implement OpenFormula as a standalone library, Java or Python, where you feed it an expression, along with an “address resolver” object to resolve names (e.g., cell references) to values, and then have it calculate the output value. This could be the first step toward some interesting things. For example, I give you an ODF spreadsheet and you generate a web app that executes the same model as my spreadsheet. (Many years ago, in the 1980’s there was a “spreadsheet compiler” that did something similar to 1-2-3 files). Or I give you a spreadsheet and indicate some variable input cells and you execute thousands of variations on it via Monte Carlo analysis. Or I give value ranges for you on input cells, and you calculate the sheet in variations via interval arithmetic. This may be interesting for sensitivity analysis, risk analysis, analysis of propagation of errors, etc.

Think: “Plugable spreadsheet evaluation engines, all understanding a common formula expression language.”

Once you have a standardized model for a spreadsheet and that model is independent from the calculation engine, then you have the ability to plug in in different calculation engines that conform to the standard, and these various calculation engines can have various strategies. This is a very powerful capability, made possible via standardization.

ODF at 5 Years

2010/05/01 By Rob

Five years ago today, on May 1st, 2005 OASIS approved Open Document Format 1.0 as an OASIS Standard. I’d like to take a few brief minutes to reflect on this milestone, but only a few. We’re busy at work in OASIS making final edits to ODF 1.2. We’re in our final weeks of that revision and it is “all hands on deck” to help address the remaining issues so we can send it out for final public review. But I hope I can be excused for a short diversion to mark this anniversary.

I won’t talk much about the 5 years since ODF 1.0 was approved. The ODF Alliance and their “ODF Turns Five” [pdf] does a good job there. But I would like to talk a little about ODF and why it is so important that it came about when it did, why it was so timely.

To fully appreciate the significance of ODF you need to understand the market climate in which it was created, and to understand that you need to understand a little of the history of word processors. The following time line illustrates the introduction dates of word processor applications over the past 30 years or so. You will notice some familiar and not-so-familiar names:

We can divide this time line into four time periods, each one driven by a pivotal development.

The first period was the “Pioneering Age”, when the first steps toward the modern word processor were taken. This was research-driven, primarily by Xerox PARC, who developed the first WYSIWYG word processor, Bravo as well as the first GUI word processor, Gypsy. Except for the line editor vi, which still has some adherents among the troglodyte cave dwellers, none of these first-generation applications survived, though their influence did. For example, Charles Simonyi, after working on Bravo at Xerox, went to Microsoft to develop Word. (Ah, the days before software patents…)

The next wave of word processor applications, the “Personal Computer Age” came in the 1980s with the new platforms of the IBM PC (1981) and the Apple Macintosh (1984). New platforms require apps, either new or ported, and you will see several familiar names introduced in that fruitful period.

Then we have a gap. From around 1990 to 1999 we do not see many new word processor introductions. This was the “Lost Decade”. New word processor introductions died off. Unchallenged by competition, even Microsoft Word advanced relatively little in this decade, compared to innovations before or since.

A few forces were at play here. First, there was a platform shift, from MS-DOS to MS-Windows 3.1 (1991) and Windows 95 (1995). Few companies were able to successfully port their applications to Windows. Also, the market changed significantly with the introduction of Microsoft Office as a suite of applications. Suddenly it was not enough to have a good word processor, say WordPerfect, or a good spreadsheet, say 1-2-3, or a good presentation package, say Harvard Graphics. To be competitive you needed to have all three suite components. And few companies did. Finally, there was the preferential access to operating system technical information Microsoft gave to their own applications teams, allowing Microsoft apps to run better on Microsoft operating systems than their competitors could. The decade closed with word processor competition wiped out. Analysts stopped tracking and reporting market share data when Office’s share exceeded 95%. And file formats? There were the binary DOC, XLS and PPT. And the file format documentation was only available under license from Microsoft, and only if you agreed not to make a competing word processor.

That was the shape of the market around 2000. Or more properly the state of the Microsoft monopoly.

So what happened that made ODF possible? In one word, the Internet. Well, not so much the technology of the internet itself, but widespread access to the internet via the web. This enabled the open source movement as we know it today to scale. Although open source existed before the web, unless you were at a major university or research centers, sharing source code and working collaboratively on software was very difficult. But with widespread access to email, ftp, web, eventually version control, we had the tools needed to scale open source from small teams to large teams. And to write a competitor to Microsoft Word you need a substantial team.

Why was open source so important? Because no rational profit-seeking entity would compete against a monopoly, especially one maintained by restricting access to technical information needed to interoperate. Lacking effective government regulation, the market was revived by open source. You see the same thing happen with Linux and with web browsers.

The other thing the internet and the web brought was a new platform based on open standards, HTML, CSS, XML, Javascript, allowing an interactive style of web application called “AJAX”. And since this new platform was based on open standards, Microsoft was less effective in preventing competition in this area. Certainly they tried. From ActiveX to Silverlight, from poor standards support in Internet Explorer, to the infamous memo by Bill Gates in 1998: “One thing we have got to change in our strategy – allowing Office documents to be rendered very well by other peoples browsers is one of the most destructive things we could do to the company. We have to stop putting any effort into this and make sure that Office documents very well depends on PROPRIETARY IE capabilities”, they tried, but ultimately failed to “take back the web” and turn it into a proprietary Microsoft platform.

With the new web application platform came new web-based word processors, some of which are charted above.

The net effect is that since 2000 or so we have a new diversity of word processors, open source, web-based, even the revival of commercial competition. It was against this backdrop, the history of competition and diversity all but wiped out but then restored in the new millennium, that ODF was born. Today every word processor of note supports ODF, including Microsoft Word. As Microsoft’s National Technology Director, and former CIO of Washington State, Stuart McKee said, “ODF has clearly won“. We’ve scaled the steep walls of monopoly and planted a new flag. Our former opponents are now our colleagues, working with us on ODF 1.2. We’ve shown we can win. But now we need to show that we can rule. This is the challenge. We need to continue to evolve ODF to meet user needs — and these are diverse needs — as well as accommodate a wide range of application models, from traditional heavy-weight desktop applications, to mobile apps, to web based apps, while realizing that these platforms themselves are shifting and possibly converging. Standards advance at glacial speed, while technological and competitive forces move at faster speeds. Allowing flexibility and extensibility while at the same time preserving interoperability among ODF implementations — this is a hard task, and one that is not entirely technological. The key value of ODF is to support interoperability in a market of diverse applications. This is the choice that users want.

But enough of the reflection. Time to get back to my work on ODF 1.2. I need to figure out linear depreciation according to the French accounting system so we can specify the AMORDEGRC spreadsheet function properly.

The Naming of Standards

2010/04/28 By Rob 2 Comments

I am occasionally asked, what is the correct name of the ODF standard? Is it “OpenDocument Format”? Or is it “Open Document Format”, with a space between “Open” and “Document”?

I’d like (hopefully) to clear this up.

The naming decision happened back in 2004. At that point Sun had contributed their specification for the OpenOffice XML format to OASIS, and a new TC was using that specification as the basis for developing a new standard. But what should the new standard be called?

Some wanted it to be called “OfficeDocument”, emphasizing its primary scope of use. Others wanted to call it “OpenDocument”, making its openness (a new thing in the office-document world at that time) more central, and acknowledging that its applicability was for more than just office editors.

So, as only a committee can do, a compromise was forged incorporated both ideas. The resulting official name of the standard became, “OASIS Open Document Format for Office Applications (OpenDocument)”.

If you are citing the standard for official reasons, that is the name to use. (Or the ISO name which is even longer). But clearly, that name is too long for casual use, or even use in technical writing, so we need a shorter, more convenient name. I’ll note the terms I’ve seen used, as well as my personal thoughts on whether they are a good idea:

ODF — This is what you’ll hear it called in OASIS, where the term is unambiguous. However, in other circles ODF can mean other things, from “Organ Donation Foundation” to “Oregon Department of Forestry”. So, in writing, even on this blog, I will typically use a longer form first, and only then use the acronym. This is also more search-engine friendly.
Open Document Format — This is certainly always correct and is my preferred longer form.
OpenDocument — This is also correct, the short name explicitly given in the standard. We use it, for example, in the registered MIME content types for ODF. I tend to see this more used to refer to the technology rather than the format itself. So, “OpenDocument applications” or “OpenDocument toolkits”. But if I had omnipotent powers, I’d eliminate this short name and make the short name official “ODF”.
OpenDocument Format — This is less correct, using the official short name and then appending a proper case “Format” after it. It is hard to justify, but it does occur in many places.
OpenDoc — This is absolutely wrong. OpenDoc is the name of an unrelated technology Apple developed in the 1990s.

When the IBM Terminology group contacted me on this (and yes we apparently have such a group) my advice to them — and I commend the same to you — is:

When citing the standard, use the official name “OASIS Open Document Format for Office Applications (OpenDocument) v1.1”.
When referring to the format in general, call it “Open Document Format” at its first use in a document, and then feel free to abbreviate it as “ODF”.

There are those who say that standards also have a THIRD NAME, a secret name that they use only for themselves. What deep and inscrutable name ODF calls itself is a matter of some speculation.

Why I like Oracle’s $90 ODF Plugin

2010/04/21 By Rob 13 Comments

There has been a flurry of news articles about Oracle’s price change on their (formerly Sun’s) ODF Plugin for MS Office. What was previously free (as in beer), at least for individual use, is now sold for $90 and with a minimum quantity of 100. The broad coverage (ZDNet, BusinessWeek, CNET, NBC, IDG, etc.) of this minor story suggests someone was shopping this story around. I wonder who?

At the risk of pouring oil on the fire, let me say that I think this is an exciting development for ODF. We have three solutions for providing ODF support in MS Office:

Oracle’s Plugin
CleverAge Add-in
Microsoft’s native ODF support

These three solutions have always varied in terms of quality of conversion, versions of MS Office supported, versions of ODF supported, level of integration into MS Office, etc. And now they vary based on price. This is a good thing. It is called “competition”. I like it.

Although I personally think that Oracle has set the price too high, I realize that we have a market to sort these things out. If they act rationally (and I assume they will) they’ll charge an amount that maximizes their return. If they are not already at a profit-maximizing price point, they will adjust. That is how prices are set in a free market. But if Oracle can really get $90 per copy, with a minimum quantity of 100, then all the power to them. I just hope that some of that money gets plowed back into their development of this and other ODF-related tools. That is how we grow stronger and more powerful ODF tools. Someone needs the impetus to make that investment. If the profit motive drives investment in ODF, then Praise be to Mammon! And remember, if Oracle’s Plugin gets more people to use ODF, then that is a larger audience for your open source ODF tool. This is a good thing. The important thing is we’re growing the number of people using ODF.

We should want companies to invest in ODF tools. We should want the demand for ODF to be such that ODF-based goods and services have value, can be sold based on that value, and that there is competition again in the market, something we have not seen in this area in many years.

2009-04-23 — Some further thoughts

It is probably worth reflecting why the Sun Plugin was necessary in the first place. If Microsoft Office supported ODF fully, in a well-integrated and interperable fashion, then surely no ODF Plugin would be necessary. You would gain your ODF support simply by purchasing your MS Office license. In effect, you are already paying for ODF support (along with all other Office features) when you purchase MS Office. If you are buying Oracle’s $90 Plugin, remember that you are essentially paying for ODF support twice: once to Microsoft and once to Oracle.

If I were paying twice for the same feature, I’d be upset as well. But is the solution really for Oracle to continue subsidizing MS Office users by giving away their Plugin for free? Or maybe Microsoft customers should ask their vendor why their Office ODF support is not adequate? Ideally there would be no need for a Plugin because the out-of-the-box ODF support would meet customer requirements. I’m sure Microsoft, like any other vendor, would value such feedback from their customers. But to me it seems perverse to blame Oracle for no longer subsidizing their competitor’s product.