This is a follow up to some work we did at the ODF Interoperability Workshop in Beijing last November. We had good participation there: IBM, Sun, Google, Novell and Redflag from the big vendor side, as well as a good number of users. It was a full-day workshop and we covered a number of topics. One of them was spreadsheet formulas. I gave a short presentation on spreadsheet interoperability, specifically on the work we’ve done on OpenFormula for ODF 1.2. We also did a short exercise to look for spreadsheet formula bugs.
As many of you know, neither ODF 1.0 nor ODF 1.1 defines a spreadsheet formula language. They leave it implementation-defined. The specification makes only a few broad statements, such as a recommendation that formula attributes be qualified by namespace, that formulas begin with ‘=’ , that cell addresses be surrounded by ‘[‘ and ‘]’ and that formula parameters be delimited by ‘;’. So in theory, this is a mess. But in practice it has worked out quite well, since implementations have played “follow the leader” and have nearly converged on interoperable spreadsheet formulas. With ODF 1.2, we’ll standardize the consensus on spreadsheet formulas, giving even greater certainties.
Let’s see how this works in practice. I created a simple spreadsheet document in several ODF-supporting applications, including Microsoft Office using the various plugins. Here is what I tested:
- Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
- Google Spreadsheets
- KOffice’s KSpread 1.6.3
- Lotus Symphony 1.1
- OpenOffice 2.4
- Microsoft Office 2003 with Sun’s ODF Plugin
I used what I had installed on my two machines, Windows and Ubuntu. There may be updates to some of these applications that do even better.
I created the same basic spreadsheet from scratch in each editor and saved it as ODF format. I then looked at each document to see how formulas were being stored in the XML:
- CleverAge stores it in the OpenOffice namespace (xmns:oooc=”http://openoffice.org/2004/calc”)
- Google also uses the OpenOffice namespace.
- KSpread doesn’t use namespace-qualified formula attributes.
- Symphony also doesn’t use namespace-qualified formula attributes.
- OpenOffice uses the OpenOffice namespace.
- Sun’s Plugin also uses the OpenOffice namespace.
OK. So there is some variation in how the formulas are stored, with two approaches in use. How does this then impact interoperability? In theory it is horrible. In practice it works out pretty well.
I took each of the 6 spreadsheet documents and opened each one in each of the other 5 applications — 30 interoperability tests — to see whether the formulas were loaded and calculated correctly. Here is what I saw:
So the formulas came through OK, in almost all instances. The only exception was the CleverAge add-in, which failed to process formulas from KSpread and Symphony. For example, loading the Symphony spreadsheet into Office 2003 results in cells with contents containing errors such as “=#REF!+#REF!-#REF!” which is tantamount to data loss.
I think we can do better than this with a few simple changes.
The Law of Robustness as stated in RFC 1122 is “Be liberal in what you accept, and conservative in what you send.” Adapting that principle to ODF spreadsheets, I recommend the following practice for ensuring interoperability using ODF 1.0 and ODF 1.1:
- When writing ODF 1.0 or ODF 1.1 spreadsheet documents, write formula attribute values using the OpenOffice namespace prefix: “http://openoffice.org/2004/calc”. All ODF spreadsheet applications I have tested accept and correctly process formulas in that namespace. Note that the CleverAge add-in is not doing the namespace checks in a XML-correct fashion. They are comparing only the text of the prefix, not resolving it to a namespace URI and comparing the URI’s. So you should be sure to also use “oooc” as the namespace prefix.
- When reading ODF 1.0 or ODF 1.1 spreadsheet documents, be prepared to handle formulas with no namespace qualification as well as those with the OpenOffice namespace.
Specifically, Symphony and KSpread should consider making changes to accommodate #1 and CleverAge should consider changes needed to do #2. In the CleverAge case, a trivial, one-line change to OdfConditionalPostProcessor.cs will quickly restore compatibility with Symphony and KSpread documents.
Now, if you are entirely satisfied with what I have said above, and have no lingering doubts, then you are not thinking enough. It is not enough to merely bring the spreadsheet formulas over intact. Interoperability also requires that we interpret the formulas in the same way.
So let’s look at that side of the equation (no pun intended). Fortunately, we are all quite close to what is being defined in ODF 1.2’s OpenFormula specification. This is not so surprising, since OpenFormula was based on actual spreadsheet practice, looking at a variety of spreadsheet applications. I did a quick test of the 6 ODF spreadsheet applications to see how well they fared against a test suite of 509 core tests that OpenFormula defines for spreadsheet functions. The results were:
- CleverAge 455/509 = 89%
- Google 457/509 = 90%
- KSpread 472/509 = 93%
- Symphony 487/509 = 96%
- OpenOffice 493/509 = 97%
- Sun Plugin 500/509 – 98%
So, we’re not yet perfect, but we’re getting pretty close. Interestingly, the lowest scores (CleverAge) and highest scores (Sun Plugin) are both for the same calculation engine (Excel).
Looking forward, we’ll continue to edit and refine OpenFormula and its test cases. You might look for it when it comes out for public review, hopefully in a couple of months. Unlike other parts of ODF 1.2, OpenFormula is essentially XML-free. It is a mini-expression language, defined by a BNF grammar and accompanied by hundreds of spreadsheet functions from mathematics, finance, engineering, statistics, etc. So review by subject matter experts in these disciplines is especially needed, even if they have zero XML experience. If you want to see the current OpenFormula Working Draft, currently in its 71st revision, take a look. Comments may be submitted to the ODF TC’s comment list.
I’m also looking forward to testing Office 2007 SP2’s ODF support when it comes out, to see how their ODF support is improving. Anything less than the 500/509 results that Excel 2003 gives with the Sun Plugin will be a disappointment. KOffice has a 2.0 version in beta I should look at. OpenOffice has their 3.0 update. Sun also has an updated ODF Plugin. I’ll lean on the Symphony team as well, and see if we can beat 500/509. Game on!