[2009/05/07 — I’ve posted a follow up article on this topic which you may want to read]
A couple of months ago I did some experiments on the interoperability of ODF spreadsheets, the theory and practice. In that earlier post I looked at the then current ODF implementations, including:
- OpenOffice.org 2.4
- Google Spreadsheets
- KOffice KSpread 1.6.3
- IBM Lotus Symphony 1.1
- Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
- Microsoft Office 2003 with Sun’s ODF Plugin
I created a test document in each of those editors and then loaded each test document in each of the other editors. I showed what worked, what didn’t, and made some suggestions on how interoperability could be improved. I found only two notable failures, when the Microsoft/CleverAge Add-in for Excel loaded KSpread and Symphony documents. The other scenarios I tested were OK:
I lot has happened in the two months since I did that analysis. Several of the applications I tested have been updated:
- CleverAge has released version 3.0 of their Add-in.
- OpenOffice 3.01 is now out and in wide use.
- Symphony 1.3 is now in beta.
- The Sun ODF Plugin is now at version 3.0.
- Microsoft Office 2007 SP2 has been released, with integrated ODF support.
- KOffice 2.0 RC 1 is now available.
I haven’t been able to get the release candidate of KOffice installed, so I’m still including KSpread 1.6.3 in my tests, but for the rest I have created new test files in each editing environment, saved them to ODF format and then loaded the resulting documents into each of the other editors. From these test documents I was able to perform 42 different test combinations.
I’ll explain a bit more how I tested, then give you the table of results, and finally make some observations and recommendations.
The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:
- Enter the title in A1 “May’s Wedding Planner” and increased font size to 14 point.
- Enter formula = TODAY() in B3 and set US style MM/DD/YY date format/
- Enter the date of the wedding as a constant in cell B4, also setting date format.
- Added simple calculations on cells B6-B8, to calculate days, weeks and months until the wedding.
- A11 through E16 is a simple ledger of the kind that is done thousands of times a day by spreadsheet users everywhere. Once you have the formula set up in column E (Balance = previous balance + credits – debits) then you can simply copy down the formula to the new row for each new entry.
The resulting spreadsheet looks something like this:
Feel free to download a zip of all of the test spreadsheet files. The file names should be self-explanatory.
Here is what I found when I tested the various scenarios:
|KSpread||Symphony||OpenOffice||Sun Plugin||CleverAge||MS Office 2007 SP2|
|MS Office 2007 SP2||Fail||Fail||Fail||Fail||Fail||Fail||OK|
So what is happening here?
CleverAge appears to have heeded the advice from my earlier blog post and now correctly processes KSpread and Symphony spreadsheets. This is great news and they deserve credit for that work. But this is a small bit of good news in a table that now shows awful lot of red. Let’s see if we can figure this out.
First, some combinations that worked previously, when I tested two months ago, are now not working:
- Symphony 1.3 beta hangs when attempting to load the spreadsheet created with the CleverAge 3.0 ODF Add-in. Symphony 1.1 also hangs when trying to load that same spreadsheet. However both versions of Symphony work fine when loading the CleverAge 2.5 spreadsheet from two months ago. The CleverAge document appears to be valid, so my guess is that this is a bug in the Symphony 1.3 beta. I’ll pass this document on to the Symphony development team to see what they say.
- KSpread 1.6.3 does not read formulas from OpenOffice 3.01 documents. KSpread had no problems with OO 2.4 documents. The problem appears to be that OpenOffice 3.01, by default, writes out documents according to the ODF 1.2 draft which puts formulas in the OpenFormula namespace. But KSpread is expecting them in the legacy namespace. The result is that spreadsheet formulas are dropped when the document is loaded in KSpread.
- In a similar way, Sun’s new ODF Plugin writes out documents according to the ODF 1.2 draft. KOffice is unable to handle these files. This also causes problems for Google Spreadsheets and the Microsoft/CleverAge Plugin for Excel, which report errors “We were unable to upload this document” and “The converter failed to open this file”.
The new entry to the mix is Microsoft Office 2007 SP2, which has added integrated ODF support. Unfortunately this support did not fare well in my tests. The problem appears to be how it treats spreadsheet formulas in ODF documents. When reading an ODF document, Excel SP2 silently strips out formulas. What is left is the last value that cell had, when previously saved.
This can cause subtle and not so subtle errors and data loss. For example, in the test document I presented above, the current date is encoded using the TODAY() spreadsheet function. If the formulas are stripped, then this cell no longer updates, and will return the wrong value. Similarly, if Maya tries to continue her ledger of expenses by copying the formula cells from column E down a row, this will cause incorrect calculations, since there is no longer a formula to copy, so she would just be copying the prior balance. In general, SP2 converts an ODF spreadsheet into a mere “table of numbers” and any calculation logic is lost.
In the other direction, when writing out spreadsheets in ODF format, Excel 2007 SP2 does include spreadsheet formulas but places them into an Excel namespace. This namespace is not what OpenOffice and other ODF applications use. It is not the ODF 1.2 namespace. It isn’t even the OOXML namespace. I have no idea what it is or what it means. Not every ODF application checks the namespace of formulas when loading documents, but the ones that do reject the SP2 documents altogether. And the ones that do not check the namespace try and fail to load a formula since it is syntactically different than what they expected. The applications essentially display a corrupted document that is shows neither the formula nor the value correctly. For example, a SP2 document, loaded in MS Office using the Sun ODF Plugin looks like this:
Similar corruption occurs when loading the Excel 2007 SP2 spreadsheet into KSpread, Symphony and OpenOffice. Google doesn’t import the document at all.
I must admit that I’m disappointed by these results. This is not a step forward compared to where we were two months ago. This is a big step backwards. Spreadsheet interoperability is not hard. This is not rocket science. Everyone knows what TODAY() means. Everyone knows what =A1+A2 means. To get this wrong requires more effort than getting it right. It is especially frustrating when we know that the underlying applications support the same fundamental formula language, or something very close to it, and are tripped up by lack of namespace coordination. Whether it is accidental or intentional I don’t know or care. But I cannot fail to notice that the same application — Microsoft Excel 2007 — will process ODF spreadsheet documents without problems when loaded via the Sun or CleverAge plugins, but will miserably fail when using the “improved” integrated code in Office 2007 SP2. This ain’t right.
I have some suggestions for how to move things forward again. There will be a lot less red on the above table if two simple changes are made:
- Sun should write out formulas in ODF 1.1 format, using the legacy “oooc” namespace prefix that the other vendors are using. Remember, the other vendors are using that namespace specifically for compatibility with OO’s ODF documents. This is the current convention. To unilaterally switch, without notice or coordination, to a new namespace, is not cool. When ODF 1.2 is an approved standard, then we all can move there in a coordinated fashion, to cause users minimal inconvenience. But the above table clearly shows the confusion that results if this move is not coordinated. I know OO 3.01 has an option to save in ODF 1.0/1.1 format. IMHO, this setting should be the default. I’m not sure if the Sun Plugin has a similar configuration option, but I hope it does.
- In addition to writing out compatible formulas as per the above comments on the Sub Plugin, Microsoft should remove the code in SP2 that causes it to reject every other vendor’s spreadsheet documents. Give the user a warning if you need to, but let them have the choice.
Finally, let me try to anticipate and debunk some of the counter-arguments which might be raised to argue against interoperability.
First, we might hear that ODF 1.1 does not define spreadsheet formulas and therefore it is not necessary for one vendor to use the same formula language that other vendors use. This is certainly is true if your sole goal is to claim conformance. If your business model requires only conformance and not actually achieving interoperability, then I wish you well. But remember that conformance and interoperability are not mutually exclusive options. An application can be conformant to a standard and also be interoperable, if you use the legacy formula namespace and syntax. So the desire to be conformant is not an excuse for not also being interoperable, or at least not a valid excuse. One might also wryly note that Microsoft has several Directors of Interoperability, not Directors of Minimal Conformance, and they workshops are called Document Interoperability Initiatives, not Minimal Conformance Initiatives. The difference between minimal conformance and interoperability is well illustrated in these tests.
Remember, it is not particularly difficult or clever to to take an adverse reading of a standard to make an incompatible, non-interoperable product. Take HTML, for example. It does not define the attributes of unstyled (default) text. So I could create a perfectly conformant browser implementation that makes all default text be 4-point Zapf Dingbats, white text on a white background. It would conform with the standard, but it would be perfectly unusable by anyone. If you try hard enough you can create 100% conformant, but non-interoperable, implementations of almost most standards. Standards are voluntary, written to help coordinate multiple parties in their desires for interoperability. Standards are not written to compel interoperability by parties who do not wish to be interoperable.
(A side point is that SP2’s implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard, but that is another story, for another blog post.)
We might also hear concerns that supporting other vendors’ ODF spreadsheet formulas cannot be done because this formula language is undocumented. The irony here is that the formula language used by OpenOffice (and by other vendors) is based on that used by Excel, which itself was not fully documented when OpenOffice implemented it. So an argument, by Microsoft, not to support that language because it is not documented is rather hypocritical. Excel supports 1-2-3 files and formulas and legacy Excel versions (back to Excel 4.0) neither of which have standardized formula languages. Why are these supported? Also, the fact that the Microsoft/CleverAge add-in correctly reads and writes the legacy ODF formula syntax shows not only that it can be done, but that Microsoft already has the code to do it. The inexplicable thing is why that code never made it into Excel 2007 SP2.
We’ll probably also hear that 100% compatibility with legacy documents is critical to Microsoft users and that it is dangerous to try to save Excel formulas into interoperable ODF formulas because there is no guarantees that OpenOffice or any other ODF application will interpret them the same as Excel does. So one might try to claim that Microsoft is protecting their customers by preventing them from saving interoperable spreadsheet formulas. But we should note that fully-licensed Microsoft Office users have already been creating legacy documents in ODF format, using the Microsoft/CleverAge ODF Add-in. These paying Microsoft Office customers will now see their existing investment in ODF documents, created using Microsoft-sanctioned code, get corrupted when loaded in Excel 2007 SP2. Why are paying Microsoft customers who used ODF less important than Microsoft customers who used OOXML? That is the shocking thing here, the way in which users of the ODF Add-in are being sacrificed.
If you are cynical, you might observe that if Excel 2007 SP2 allowed Microsoft/CleverAge ODF Add-in formulas to work correctly, then SP2 would need to allow all vendors’ formulas to work, since the other vendors are using the same legacy namespace. The only way for Microsoft to make their legacy ODF documents work and to exclude other vendors would be (hypothetically) to specifically look in the document for the name of the application that created the document, and allow their ODF Add-in but reject OpenOffice, etc. IANAL, but I think something like that would look very, very bad to competition authorities. So the only way out, if your goal (hypothetically) is to avoid interoperability, is to sacrifice your existing Office customers who are using the Microsoft/CleverAge ODF Add-in. It serves them right for not sticking to the party line in the first place. This’ll teach ’em good.
Of course, I am not that cynical. I was taught to never assume malice where incompetence would be the simpler explanation. But the degree of incompetence needed to explain SP2’s poor ODF support boggles the mind and leads me to further uncharitable thoughts. So I must stop here.
As I mentioned before, this is a step backwards. But it is just one step on the journey. Let’s look forward (and move forward). This is just code. Code can be fixed. We know exactly what is needed to have good interoperability of spreadsheet formulas. In fact most of the code already exists for this. The only thing we need now is to actually go do it and not get too far ahead, or lag too far behind from the other implementations. This is more a question of timing and coordination than hard technical problems.
[2009/05/07 — For more on this topic, see my “A follow-up on Excel 2007 SP2’s ODF Support“]