Archives for May 2009

The Battle for ODF Interoperability

2009/05/17 By Rob 33 Comments

Last year, when I was socializing the idea of creating the OASIS ODF Interoperability and Conformance TC, I gave a presentation I called “ODF Interoperability: The Price of Success”. The observation was that standards that fail never need to deal with interoperability. The creation of test suites, convening of multi-vendor interoperability workshops and plugfests is a sign of a successful standard, one which is implemented by many vendors, one which is adopted by many users, one which has vendor-neutral venues for testing implementations and iteratively refining the standard itself.

Failed standards don’t need to work on interoperability because failed standards are not implemented. Look around you. Where are the OOXML test suites? Where are the OOXML plugfests? Indeed, where are the OOXML implementations and adoptions? Microsoft Office has not implemented ISO/IEC 29500 “Office Open XML”, and neither has anyone else. In one of the great ironies, Microsoft’s escapades in ISO have left them clutching a handful of dust, while they scramble now to implement ODF correctly. This is reminiscent of their expensive and failed gamble on HD DVD on the XBox, followed eventually by a quick adoption of Blue-ray once it was clear which direction the market was going. That’s the way standards wars typically end in markets with strong network effects. They tend to end very quickly, with a single standard winning. Of course, the user wins in that situation as well. This isn’t Highlander. This is economic reality. This is how the world works.

Although this may appear messy to an outside observer, our current conversation on ODF interoperability is a good thing, and further proof, to use the words Microsoft’s National Technology Director, Stuart McKee, that “ODF has clearly won“.

Fixing interoperability defects is the price of success, and we’re paying that price now. The rewards will be well worth the cost.

We’ve come very far in only a few years. First we had to fight for even the idea and acceptance of open standards, in a world dominated by a RAND view of exclusionary standards created in smoke filled rooms, where vendors bargained about how many patents they could load up a standard with. We won that battle. Then we had to fight for ODF, a particular open standard, against a monopolist clinging to its vendor lock-in and control over the world’s documents. We won that battle. But our work doesn’t end here. We need to continue the fight, to ensure that users of document editors, you and I, get the full interoperability benefits of ODF. Other standards, like HTML, CSS, EcmaScript, etc., all went through this phase. Now it is our turn.

With an open standard, like ODF, I own my document. I choose what application I use to author that document. But when I send that document to you, or post it on my web site, I do so knowing that you have the same right to choose as I had, and you may choose to use a different application and a different platform than I used. That is the power of ODF.

Of course, the standard itself, the ink on the pages, does not accomplish this by itself. A standard is not a holy relic. I cannot take the ODF standard and touch it to your forehead say “Be thou now interoperable!” and have it happen. If a vendor wants to achieve interoperability, they need to read (and interpret) the standard with an eye to interoperability. They need to engage in testing with other implementations. And they need to talk to their users about their interoperability expectations. This is not just engineering. Interoperability is a way of doing business. If you are trying to achieve interoperability by locking yourself in a room with a standard, then you’ll have as much luck as trying to procreate while locked in a room with a book on human reproduction. Interoperability, like sex, is a social activity. If you’re doing it alone then you’re doing it wrong.

Standards are written documents — text — and as such they require interpretation. There are many schools of textual interpretation: legal, literary, historic, linguistic, etc. The most relevant one, from the perspective of a standard, is what is called “purposive” or “commercial” interpretation, commonly applied by judges to contracts. When interpreting a document using an purposive view, you look at the purpose, or intent, of a document in its full context, and interpret the text harmoniously with that intent. Since the purpose of a standard is to foster interoperability, any interpretation of the text of a standard which is used to argue in favor of, or in defense of, a non-interoperable implementation, has missed the mark. Not all interpretations are equal. Interpretations which are incongruous with the intent of standardization can easily be rejected.

Standards can not force a vendor to be interoperable. If a vendor wishes deliberately to withhold interoperability from the market, then they will always be able to do so, and, in most cases, devise an excuse using the text of the standard as a scapegoat.

Let’s work through a quick example, to show how this can happen.

OpenFormula is the part of ODF 1.2 that defines spreadsheet formulas. The current draft defines the addition operator as:

6.3.1 Infix Operator “+”

Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds numbers together.

I think most vendors would manage to make an interoperable implementation of this. But if you wanted to be incompatible, there are certainly ways to do so. For example, given the expression “1+1” I could return “42” and still claim to be interoperable. Why? Because the text says “adds numbers together”, but doesn’t explicitly say which numbers to add together. If you decided to add 1 and 41 together, you could claim to be conformant. OK, so let’s correct the text so it now reads:

6.3.1 Infix Operator “+”

Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds Left to Right.

So, this is bullet-proof now, right? Not really. If I want to, I can say that 1+1 =10, if I want to claim that my implementation works in base 2. We can fix that in the standard, giving us:

6.3.1 Infix Operator “+”

Summary: Add two numbers.
Syntax: Number Left + Number Right, both in base 10 representations
Returns: Number, in base 10
Constraints: None
Semantics: Adds Left to Right.

Better, perhaps. But if I want I can still break compatibility. For example, I could say 1+1=0, and claim that my implementation rounds off to the nearest multiple of 5. Or I could say that 1+1 = 1, claiming that the ‘+’ sign was taken as representing the logical disjunction operator rather than arithmetic addition. Or I could do addition modulo 7, and say that the text did not explicitly forbid that. Or I could return the correct answer some times, but not other times, claiming that the standard did not say “always”. Or I could just insert a sleep(5000) statement in my code, and pause 5 seconds every time the an addition operation is performed, making a useless, but conformant implementation And so on, and so on.

The old adage holds, “It is impossible to make anything fool- proof because fools are so ingenious.” A standard cannot compel interoperability from those who want resist it. A standard is merely one tool, which when combined with others, like test suites and plugfests, facilitates groups of cooperating parties to achieve interoperability.

Now is the time to achieve interoperability among ODF implementations. We’re beyond kind words and empty promises. When Microsoft first announced, last May, that it would add ODF support to Office 2007 SP2, they did so with many fine words:

“Microsoft Corp. is offering customers greater choice and more flexibility among document formats”
Microsoft is “committed to work with others toward robust, consistent and interoperable implementations”
Chris Capossela, senior vice president for the Microsoft Business Division: “We are committed to providing Office users with greater choice among document formats and enhanced interoperability between those formats and the applications that implement them”
“Microsoft recognizes that customers care most about real-world interoperability in the marketplace, so the company is committed to continuing to engage the IT community to achieve that goal when it comes to document format standards”
Microsoft will “work with the Interoperability Executive Customer Council and other customers to identify the areas where document format interoperability matters most, and then collaborate with other vendors to achieve interoperability between their implementations of the formats that customers are using today. This work will continue to be carried out in the Interop Vendor Alliance, the Document Interoperability Initiative, and a range of other interoperability labs and collaborative venues.”
“This work on document formats is only one aspect of how Microsoft is delivering choice, interoperability and innovative solutions to the marketplace.”

So the words are there, certainly. But what was delivered fell far, far short of what they promised. Excel 2007 SP2 strips out spreadsheet formulas when it reads ODF spreadsheets from every other vendor’s spreadsheets, and even from spreadsheets created by Microsoft’s own ODF Add-in for Excel. No other vendor does this. Spreadsheet formulas are the very essence of a spreadsheet. To fail to achieve this level of interoperability calls into question the value and relevance of what was touted as an impressive array of interoperability initiatives. What value is an Interoperability Executive Council, an Interop Vendor Alliance, a Document Interoperability Initiative, etc., if they were not able to motivate the most simple act: taking spreadsheet formula translation code that Microsoft already has (from the ODF Add-in for Office) and using it in SP2?

The pretty words have been shown to be hollow words. Microsoft has not enabled choice. Their implementation is not robust. They have, in effect, taken your ODF document, written by you by your choice in an interoperable format, with demonstrated interoperability among several implementations, and corrupted it, without your knowledge or consent.

There are no shortage of excuses from Redmond. If customers wanted excuses more than interoperability they would be quite pleased by Microsoft’s prolix effusions on this topic. The volume of text used to excuse their interoperability failure, exceeds, by an order of magnitude, the amount of code that would be required to fix the problem. The latest excuse is the paternalistic concern expressed by Doug Mahugh, saying that they are corrupting spreadsheets in order to protect the user. Using a contrived example, of a customer who tries to add cells containing text to those containing numbers, Doug observes that OpenOffice and Excel give different answers to the formula = 1+ “2”. Because all implementations do not give the same answer, Microsoft strips out formulas. Better to be the broken clock that reads the correct time twice a day, than to be unpredictable, or as Doug puts it:

If I move my spreadsheet from one application to another, and then discover I can’t recalculate it any longer, that is certainly disappointing. But the behavior is predictable: nothing recalculates, and no erroneous results are created.

But what if I move my spreadsheet and everything looks fine at first, and I can recalculate my totals, but only much later do I discover that the results are completely different than the results I got in the first application?

That will most definitely not be a predictable experience. And in actual fact, the unpredictable consequences of that sort of variation in spreadsheet behavior can be very consequential for some users. Our customers expect and require accurate, predictable results, and so do we. That’s why we put so much time, money and effort into working through these difficult issues.

This bears a close resemblance to what is sometimes called “Ben Tre Logic”, after the Vietnamese town whose demise was excused by a U.S. General with the argument, “It became necessary to destroy the village in order to save it.”

Doug’s argument may sound plausible at first glance. There is that scary “unpredictable consequences”. We can’t have any of that, can we? Civilization would fall, right? But what if I told you that the same error with the same spreadsheet formula occurs when you exchange spreadsheets in OOXML format between Excel and OpenOffice? Ditto for exchanging them in the binary XLS format. In reality, this difference in behavior has nothing to do with the format, ODF or OOXML or XLS. It is a property of the application. So, why is Microsoft not stripping out formulas when reading OOXML spreadsheet files? After all, they have exactly the same bug that Doug uses as the centerpiece of his argument for why formulas are stripped from ODF documents. Why is Microsoft not concerned with “unpredictable consequences” when using OOXML? Why do users seem not to require “accurate, predictable results” when using OOXML? Or to be blunt, why is Microsoft discriminating against their own paying customers who have chosen to use ODF rather than OOXML? How is this reconciled with Microsoft’s claim that they are delivering “choice, interoperability and innovative solutions to the marketplace”?

A follow-up on Excel 2007 SP2’s ODF support

2009/05/07 By Rob 36 Comments

Wow. My previous post seems to have attracted some attention. When I woke up on Monday morning, made my coffee and logged into to my email, I found out that my geeky little analysis of Office 2007 SP2’s ODF support had sparked some interest. I did not intend it to be more than an update for the handful of the “usual suspects” who regularly follow ODF issues via various blogs, many of which you see listed to your right. If I had any foreknowledge or expectation that this post would end up being on SlashDot, GrokLaw, ZDnet, IDG, Reuters, CNet, etc., I would have done a better job spell checking, and maybe toned down the rhetoric a little (just a little).

But this widespread interest in the topic tells me one thing: ODF is important. People care about it. People want it to succeed, and when this success is threatened, whether for deliberate or accidental reasons, they are upset. Although Office 2007 SP2 also added PDF and XPS support, you don’t see many stories on that at all.

I’ve been trying to respond to the many comments by anonymous FUDsters and Fanboys on various web sites where my post is being discussed. However, it is getting rather laborious swatting all the gnats. They obviously breed in stagnant waters, and there is an awful lot of that on the web. Since all links lead back here anyways, it will be much simpler to do a recap here and address some of the more widespread errors.

The talking points from Redmond seem to be consistent, along the lines of:

We did a 100% perfect and conforming implementation of ODF 1.1 to the letter of the standard. If it is not interoperable, then it is the fault of the standard or the other applications or some guy we saw sneaking around back on the night of the fire. In any case, it is not our fault. We just design, write, test and sell software to users, businesses, governments and educational institutions. We have no influence over whether our products are interoperable or not. What effect SP2 has on users or the market — that’s not our concern. Come back in 50 years when you have a 100% perfect standard and maybe we’ll talk.

In other words, all of those Interoperability Directors and Interoperability Architects at Microsoft seem to have (hopefully temporarily) switched into Minimal Conformance Directors and Minimal Conformance Architects, and are gazing at their navels. I hope they did not suffer a reduction in salary commensurate with the reduction in their claimed responsibilities.

In any case, their argument might be challenged on several grounds. First up is the question of whether the ODF documents written by Excel 2007 SP2 indeed conform to the ODF 1.1 standard. This is not a hard question to answer, but please excuse this short technical diversion.

Let’s see what the ODF 1.1 standard says in section 8.1.3 (Table Cell):

Addresses of cells that contain numbers. The addresses can be relative or absolute, see section 8.3.1. Addresses in formulas start with a “[“ and end with a “]”. See sections 8.3.1 and 8.3.1 for information about how to address a cell or cell range.

And the referenced section 8.3.1 further says:

To reference table cells so called cell addresses are used. The structure of a cell address is as follows:

The name of the table.

A dot (.)

An alphabetic value representing the column. The letter A represents column 1, B represents column 2, and so on. AA represents column 27, AB represents column 28, and so on.

A numeric value representing the row. The number 1 represents the first row, the number 2 represents the second row, and so on.

This means that A1 represents the cell in column 1 and row 1. B1 represents the cell in column 2 and row 1. A2 represents the cell in column 1 and row 2.

For example, in a table with the name SampleTable the cell in column 34 and row 16 is referenced by the cell address SampleTable.AH16. In some cases it is not necessary to provide the name of the table. However, the dot must be present. When the table name is not required, the address in the previous example is .AH16

So, going back to my test spreadsheets from all of the various ODF applications, how do these applications encode formulas with cell addresses:

Symphony 1.3: =[.E12]+[.C13]-[.D13]
Microsoft/CleverAge 3.0: =[.E12]+[.C13]-[.D13]
KSpread 1.6.3: =[.E12]+[.C13]-[.D13]
Google Spreadsheets: =[.E12]+[.C13]-[.D13]
OpenOffice 3.01: =[.E12]+[.C13]-[.D13]
Sun Plugin 3.0: [.E12]+[.C13]-[.D13]
Excel 2007 SP2: =E12+C13-D13

I’ll leave it as an exercise to the reader to determine which one of these seven is wrong and does not conform to the ODF 1.1 standard.

Next is the question of the relationship between interoperability and conformance. So we are not building skyscrapers in the air, let’s start with a working definition of interoperability, say that given by ISO/IEC 2382-01, “Information Technology Vocabulary, Fundamental Terms”:

The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units

I think we probably have a better sense of what conformance is. Something conforms when it meets the requirements defined by a standard.

So let’s explore explore the relationship between conformance to a standard and interoperability.

First, does interoperability require a standard? No. There have been interoperable systems without formal standards. For example, there is a degree of interoperability among spreadsheet vendors on the basis of the legacy Excel binary file format (XLS), even though the binary format was never standardized and never defines spreadsheet formulas. Another example is the SAX XML parsing API. Widely implemented, but never standardized. We may call them informal or de facto standards.

Additionally, many standards start out as informal technical agreements and specifications that achieve interoperability among a small group of users, who then move it forward to standardization so that a broader audience can benefit. But the interoperability came first and the formal standard came second. See the history of the Atom syndication format for a good example.

Second, Is interoperability possible in the presence of non-conformance? Yes. For example, it is well known that the vast majority of web pages (93% by one estimate) on the web today do not conform to the HTML standard. But there is a not unsubstantial degree of interoperability on the web today in spite of this lack of conformance. Generally, interoperability does not require perfection. It requires good faith and hard work. If perfection were required, nothing would work in this world, would it?

Third, if a standard does not define something (like spreadsheet formulas) then I am allowed to do whatever I want, right? This is true. But further, even if ODF 1.1 did define spreadsheet formulas you would still be allowed to do whatever you want. Remember, these are voluntary standards. We can’t force you to do anything, whether we define it or not.

So what then is the precise relationship between conformance and interoperability? I’d state it as:

In general, conformance is neither necessary nor sufficient for to achieve interoperability.
But interoperability is most efficiently achieved by conformance to an open standard where the standard clearly states those requirements which must be met to achieve interoperability.

In other words, the relationship is due to the efficiency of this configuration to those who wish to interoperate. Conformance is neither necessary nor sufficient to achieve interoperability in general, but interoperability is most efficiently achieved when conformance guarantees interoperability. When I talk about “standards-based interoperability” I’m talking about the situation when you are in the neighborhood of that optimal point.

The inefficiency of other orientations is seen with HTML and Web browsers. Because of the historically low level of HTML conformance by authoring tools and users who hand-edit HTML, browsers today are much more complex then they would otherwise need to be. They need to handle all sorts of mal-formed HTML documents. This complexity extends to any tool that needs to process HTML. Sure, we have a pretty good grip on this now, with tools like HTML Tidy and other robust parsers, but this has come at a cost. Complexity eats up resources, both to coders and testers, but also runtime resources, memory and processing cycles. More complex code is harder to maintain and secure and tends to have more bugs. Greater conformance would have lead to a more efficient relationship between conformance and interoperability.

Similarly, the many years of non-conformance in browsers, most notably Internet Explorer, to the CSS2 standard has resulted in an inefficiency there. From the perspective of web designers, tool authors and competing browser vendors, the lack of conformance to the standards has increased the cost needed to achieve interoperability, a cost transferred from a dominate vendor who chose not to conform to the standards, to other vendors who did conform.

The efficiency of conformance to open standards in particular is the clarity and freedom it provides around access to the standard and the contingent IP rights needed to implement the standard.

So back to ODF 1.1. What is the relationship between conformance and interoperability there? Clearly, it is not yet at that optimal point (which few standards ever achieve) where interoperability is most-efficiently achieved. We’re working on it. ODF 1.2 will be better in that regard than ODF 1.1, and the next version will improve on that, and so on.

Does this mean that you cannot create interoperable solutions with ODF? No, it just means that, like most standards in IT today, you need to do some interoperability testing with other vendor’s products to make sure your product interoperates, and make conformant adjustments to your product in order to achieve real-world nteroperability. Most vendors who don’t have a monopoly would do this naturally and in fact have done this, as my chart indicated. Complaining about this is like complaining about gravity or friction or entropy. Sure, it sucks. Deal with it. Although it may not pay as much as being a professional mourner, work as a programmer is more regular. And giving value to customers will always bring more satisfaction than than standing there weeping about how code is hard.

In any case, this comes down to why do you implement a standard. What are your goals? If your goal is be interoperable, then you perform interoperability testing and make those adjustments to your product necessary to make it be both conformant and interoperable. But if your goal is to simply fulfill a checkbox requirement without actually providing any tangible customer benefit, then you will do as little as needed. However, if your goal is to destroy a standard, then you will create a non-conformant, non-interoperable implementation, automatically download it to millions of users and sow confusion in the marketplace by flooding it with millions of incompatible documents. It all depends on your goals. Voluntary standards do not force, or prevent, one approach or another.

To wrap this up, I stand on the table of interoperability results in the previous post. SP2 has reduced the level of interoperability among ODF spreadsheets, by failing to produce conforming ODF documents, and failing to take note of the spreadsheet formula conventions that had been adopted by all of the other vendors and which are working their way through OASIS as a standard.

If we note the arguments used by Microsoft in the recent past, they have argued that OOXML must be exactly what it is — flaws and all — in order to be compatible with legacy binary Office documents. Then they argued that OOXML can not be changed in ISO, because that would create incompatibility with the “new legacy” documents in Office 2007 XML format. But when it comes to ODF, they have disregarded all legacy ODF documents created by all other ODF vendors and take an aloof stance that looks with disdain on interoperability with other vendor’s documents, or even documents produced by their own ODF Add-in. The sacrosanctness of legacy compatibility appears to be reserved, for strategic reasons, for some formats but not others. We’ll redefine the Gregorian calender in ISO to be interoperable with one format if we need to, but we won’t deign, won’t stoop, won’t dirty ourselves to use the code we already have from the ODF Add-in for Microsoft Office, to make SP2 formulas interoperable with the other vendors’ products, to benefit our own users who are asking for ODF support in Office. As I said before, this ain’t right.

OpenDocument Format: The Standard for Office Documents

2009/05/05 By Rob Leave a Comment

A belated note that an article of mine on ODF was recently published in IEEE Internet Computing, called “OpenDocument Format: The Standard for Office Documents“. I think it is a good introduction to ODF, what it is, where it came from and why it is important. They allow authors to post a copy on their websites. So feel free to link to it, but any redistribution will need to be negotiated with the publisher.

At the same time I’ve taken the opportunity to put together a new web page of some of my other publications, workshop and conference presentations. I have few others that I want add, once I find them. But this is a start.

Update on ODF Spreadsheet Interoperability

2009/05/03 By Rob 33 Comments

[2009/05/07 — I’ve posted a follow up article on this topic which you may want to read]

A couple of months ago I did some experiments on the interoperability of ODF spreadsheets, the theory and practice. In that earlier post I looked at the then current ODF implementations, including:

OpenOffice.org 2.4
Google Spreadsheets
KOffice KSpread 1.6.3
IBM Lotus Symphony 1.1
Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
Microsoft Office 2003 with Sun’s ODF Plugin

I created a test document in each of those editors and then loaded each test document in each of the other editors. I showed what worked, what didn’t, and made some suggestions on how interoperability could be improved. I found only two notable failures, when the Microsoft/CleverAge Add-in for Excel loaded KSpread and Symphony documents. The other scenarios I tested were OK:

		CleverAge	Google	KSpread	Symphony	OpenOffice	Sun Plugin
		Created In
Read In	CleverAge	OK	OK	Fail	Fail	OK	OK
	Google	OK	OK	OK	OK	OK	OK
	KSpread	OK	OK	OK	OK	OK	OK
	Symphony	OK	OK	OK	OK	OK	OK
	OpenOffice	OK	OK	OK	OK	OK	OK
	Sun Plugin	OK	OK	OK	OK	OK	OK

I lot has happened in the two months since I did that analysis. Several of the applications I tested have been updated:

CleverAge has released version 3.0 of their Add-in.
OpenOffice 3.01 is now out and in wide use.
Symphony 1.3 is now in beta.
The Sun ODF Plugin is now at version 3.0.
Microsoft Office 2007 SP2 has been released, with integrated ODF support.
KOffice 2.0 RC 1 is now available.

I haven’t been able to get the release candidate of KOffice installed, so I’m still including KSpread 1.6.3 in my tests, but for the rest I have created new test files in each editing environment, saved them to ODF format and then loaded the resulting documents into each of the other editors. From these test documents I was able to perform 42 different test combinations.

I’ll explain a bit more how I tested, then give you the table of results, and finally make some observations and recommendations.

The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:

Enter the title in A1 “May’s Wedding Planner” and increased font size to 14 point.
Enter formula = TODAY() in B3 and set US style MM/DD/YY date format/
Enter the date of the wedding as a constant in cell B4, also setting date format.
Added simple calculations on cells B6-B8, to calculate days, weeks and months until the wedding.
A11 through E16 is a simple ledger of the kind that is done thousands of times a day by spreadsheet users everywhere. Once you have the formula set up in column E (Balance = previous balance + credits – debits) then you can simply copy down the formula to the new row for each new entry.

The resulting spreadsheet looks something like this:

Feel free to download a zip of all of the test spreadsheet files. The file names should be self-explanatory.

Here is what I found when I tested the various scenarios:

		Google	KSpread	Symphony	OpenOffice	Sun Plugin	CleverAge	MS Office 2007 SP2
		Created In
Read In	Google	OK	OK	OK	OK	Fail	OK	Fail
	KSpread	OK	OK	OK	Fail	Fail	OK	Fail
	Symphony	OK	OK	OK	OK	OK	Fail	Fail
	OpenOffice	OK	OK	OK	OK	OK	OK	Fail
	Sun Plugin	OK	OK	OK	OK	OK	OK	Fail
	CleverAge Plugin	OK	OK	OK	OK	Fail	OK	OK
	MS Office 2007 SP2	Fail	Fail	Fail	Fail	Fail	Fail	OK

So what is happening here?

CleverAge appears to have heeded the advice from my earlier blog post and now correctly processes KSpread and Symphony spreadsheets. This is great news and they deserve credit for that work. But this is a small bit of good news in a table that now shows awful lot of red. Let’s see if we can figure this out.

First, some combinations that worked previously, when I tested two months ago, are now not working:

Symphony 1.3 beta hangs when attempting to load the spreadsheet created with the CleverAge 3.0 ODF Add-in. Symphony 1.1 also hangs when trying to load that same spreadsheet. However both versions of Symphony work fine when loading the CleverAge 2.5 spreadsheet from two months ago. The CleverAge document appears to be valid, so my guess is that this is a bug in the Symphony 1.3 beta. I’ll pass this document on to the Symphony development team to see what they say.
KSpread 1.6.3 does not read formulas from OpenOffice 3.01 documents. KSpread had no problems with OO 2.4 documents. The problem appears to be that OpenOffice 3.01, by default, writes out documents according to the ODF 1.2 draft which puts formulas in the OpenFormula namespace. But KSpread is expecting them in the legacy namespace. The result is that spreadsheet formulas are dropped when the document is loaded in KSpread.
In a similar way, Sun’s new ODF Plugin writes out documents according to the ODF 1.2 draft. KOffice is unable to handle these files. This also causes problems for Google Spreadsheets and the Microsoft/CleverAge Plugin for Excel, which report errors “We were unable to upload this document” and “The converter failed to open this file”.

The new entry to the mix is Microsoft Office 2007 SP2, which has added integrated ODF support. Unfortunately this support did not fare well in my tests. The problem appears to be how it treats spreadsheet formulas in ODF documents. When reading an ODF document, Excel SP2 silently strips out formulas. What is left is the last value that cell had, when previously saved.

This can cause subtle and not so subtle errors and data loss. For example, in the test document I presented above, the current date is encoded using the TODAY() spreadsheet function. If the formulas are stripped, then this cell no longer updates, and will return the wrong value. Similarly, if Maya tries to continue her ledger of expenses by copying the formula cells from column E down a row, this will cause incorrect calculations, since there is no longer a formula to copy, so she would just be copying the prior balance. In general, SP2 converts an ODF spreadsheet into a mere “table of numbers” and any calculation logic is lost.

In the other direction, when writing out spreadsheets in ODF format, Excel 2007 SP2 does include spreadsheet formulas but places them into an Excel namespace. This namespace is not what OpenOffice and other ODF applications use. It is not the ODF 1.2 namespace. It isn’t even the OOXML namespace. I have no idea what it is or what it means. Not every ODF application checks the namespace of formulas when loading documents, but the ones that do reject the SP2 documents altogether. And the ones that do not check the namespace try and fail to load a formula since it is syntactically different than what they expected. The applications essentially display a corrupted document that is shows neither the formula nor the value correctly. For example, a SP2 document, loaded in MS Office using the Sun ODF Plugin looks like this:

Similar corruption occurs when loading the Excel 2007 SP2 spreadsheet into KSpread, Symphony and OpenOffice. Google doesn’t import the document at all.

I must admit that I’m disappointed by these results. This is not a step forward compared to where we were two months ago. This is a big step backwards. Spreadsheet interoperability is not hard. This is not rocket science. Everyone knows what TODAY() means. Everyone knows what =A1+A2 means. To get this wrong requires more effort than getting it right. It is especially frustrating when we know that the underlying applications support the same fundamental formula language, or something very close to it, and are tripped up by lack of namespace coordination. Whether it is accidental or intentional I don’t know or care. But I cannot fail to notice that the same application — Microsoft Excel 2007 — will process ODF spreadsheet documents without problems when loaded via the Sun or CleverAge plugins, but will miserably fail when using the “improved” integrated code in Office 2007 SP2. This ain’t right.

I have some suggestions for how to move things forward again. There will be a lot less red on the above table if two simple changes are made:

Sun should write out formulas in ODF 1.1 format, using the legacy “oooc” namespace prefix that the other vendors are using. Remember, the other vendors are using that namespace specifically for compatibility with OO’s ODF documents. This is the current convention. To unilaterally switch, without notice or coordination, to a new namespace, is not cool. When ODF 1.2 is an approved standard, then we all can move there in a coordinated fashion, to cause users minimal inconvenience. But the above table clearly shows the confusion that results if this move is not coordinated. I know OO 3.01 has an option to save in ODF 1.0/1.1 format. IMHO, this setting should be the default. I’m not sure if the Sun Plugin has a similar configuration option, but I hope it does.
In addition to writing out compatible formulas as per the above comments on the Sub Plugin, Microsoft should remove the code in SP2 that causes it to reject every other vendor’s spreadsheet documents. Give the user a warning if you need to, but let them have the choice.

Finally, let me try to anticipate and debunk some of the counter-arguments which might be raised to argue against interoperability.

First, we might hear that ODF 1.1 does not define spreadsheet formulas and therefore it is not necessary for one vendor to use the same formula language that other vendors use. This is certainly is true if your sole goal is to claim conformance. If your business model requires only conformance and not actually achieving interoperability, then I wish you well. But remember that conformance and interoperability are not mutually exclusive options. An application can be conformant to a standard and also be interoperable, if you use the legacy formula namespace and syntax. So the desire to be conformant is not an excuse for not also being interoperable, or at least not a valid excuse. One might also wryly note that Microsoft has several Directors of Interoperability, not Directors of Minimal Conformance, and they workshops are called Document Interoperability Initiatives, not Minimal Conformance Initiatives. The difference between minimal conformance and interoperability is well illustrated in these tests.

Remember, it is not particularly difficult or clever to to take an adverse reading of a standard to make an incompatible, non-interoperable product. Take HTML, for example. It does not define the attributes of unstyled (default) text. So I could create a perfectly conformant browser implementation that makes all default text be 4-point Zapf Dingbats, white text on a white background. It would conform with the standard, but it would be perfectly unusable by anyone. If you try hard enough you can create 100% conformant, but non-interoperable, implementations of almost most standards. Standards are voluntary, written to help coordinate multiple parties in their desires for interoperability. Standards are not written to compel interoperability by parties who do not wish to be interoperable.

(A side point is that SP2’s implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard, but that is another story, for another blog post.)

We might also hear concerns that supporting other vendors’ ODF spreadsheet formulas cannot be done because this formula language is undocumented. The irony here is that the formula language used by OpenOffice (and by other vendors) is based on that used by Excel, which itself was not fully documented when OpenOffice implemented it. So an argument, by Microsoft, not to support that language because it is not documented is rather hypocritical. Excel supports 1-2-3 files and formulas and legacy Excel versions (back to Excel 4.0) neither of which have standardized formula languages. Why are these supported? Also, the fact that the Microsoft/CleverAge add-in correctly reads and writes the legacy ODF formula syntax shows not only that it can be done, but that Microsoft already has the code to do it. The inexplicable thing is why that code never made it into Excel 2007 SP2.

We’ll probably also hear that 100% compatibility with legacy documents is critical to Microsoft users and that it is dangerous to try to save Excel formulas into interoperable ODF formulas because there is no guarantees that OpenOffice or any other ODF application will interpret them the same as Excel does. So one might try to claim that Microsoft is protecting their customers by preventing them from saving interoperable spreadsheet formulas. But we should note that fully-licensed Microsoft Office users have already been creating legacy documents in ODF format, using the Microsoft/CleverAge ODF Add-in. These paying Microsoft Office customers will now see their existing investment in ODF documents, created using Microsoft-sanctioned code, get corrupted when loaded in Excel 2007 SP2. Why are paying Microsoft customers who used ODF less important than Microsoft customers who used OOXML? That is the shocking thing here, the way in which users of the ODF Add-in are being sacrificed.

If you are cynical, you might observe that if Excel 2007 SP2 allowed Microsoft/CleverAge ODF Add-in formulas to work correctly, then SP2 would need to allow all vendors’ formulas to work, since the other vendors are using the same legacy namespace. The only way for Microsoft to make their legacy ODF documents work and to exclude other vendors would be (hypothetically) to specifically look in the document for the name of the application that created the document, and allow their ODF Add-in but reject OpenOffice, etc. IANAL, but I think something like that would look very, very bad to competition authorities. So the only way out, if your goal (hypothetically) is to avoid interoperability, is to sacrifice your existing Office customers who are using the Microsoft/CleverAge ODF Add-in. It serves them right for not sticking to the party line in the first place. This’ll teach ’em good.

Of course, I am not that cynical. I was taught to never assume malice where incompetence would be the simpler explanation. But the degree of incompetence needed to explain SP2’s poor ODF support boggles the mind and leads me to further uncharitable thoughts. So I must stop here.

As I mentioned before, this is a step backwards. But it is just one step on the journey. Let’s look forward (and move forward). This is just code. Code can be fixed. We know exactly what is needed to have good interoperability of spreadsheet formulas. In fact most of the code already exists for this. The only thing we need now is to actually go do it and not get too far ahead, or lag too far behind from the other implementations. This is more a question of timing and coordination than hard technical problems.

[2009/05/07 — For more on this topic, see my “A follow-up on Excel 2007 SP2’s ODF Support“]