This is a follow up to some work we did at the ODF Interoperability Workshop in Beijing last November. We had good participation there: IBM, Sun, Google, Novell and Redflag from the big vendor side, as well as a good number of users. It was a full-day workshop and we covered a number of topics. One of them was spreadsheet formulas. I gave a short presentation on spreadsheet interoperability, specifically on the work we’ve done on OpenFormula for ODF 1.2. We also did a short exercise to look for spreadsheet formula bugs.
As many of you know, neither ODF 1.0 nor ODF 1.1 defines a spreadsheet formula language. They leave it implementation-defined. The specification makes only a few broad statements, such as a recommendation that formula attributes be qualified by namespace, that formulas begin with ‘=’ , that cell addresses be surrounded by ‘[‘ and ‘]’ and that formula parameters be delimited by ‘;’. So in theory, this is a mess. But in practice it has worked out quite well, since implementations have played “follow the leader” and have nearly converged on interoperable spreadsheet formulas. With ODF 1.2, we’ll standardize the consensus on spreadsheet formulas, giving even greater certainties.
Let’s see how this works in practice. I created a simple spreadsheet document in several ODF-supporting applications, including Microsoft Office using the various plugins. Here is what I tested:
- Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
- Google Spreadsheets
- KOffice’s KSpread 1.6.3
- Lotus Symphony 1.1
- OpenOffice 2.4
- Microsoft Office 2003 with Sun’s ODF Plugin
I used what I had installed on my two machines, Windows and Ubuntu. There may be updates to some of these applications that do even better.
I created the same basic spreadsheet from scratch in each editor and saved it as ODF format. I then looked at each document to see how formulas were being stored in the XML:
- CleverAge stores it in the OpenOffice namespace (xmns:oooc=”http://openoffice.org/2004/calc”)
- Google also uses the OpenOffice namespace.
- KSpread doesn’t use namespace-qualified formula attributes.
- Symphony also doesn’t use namespace-qualified formula attributes.
- OpenOffice uses the OpenOffice namespace.
- Sun’s Plugin also uses the OpenOffice namespace.
OK. So there is some variation in how the formulas are stored, with two approaches in use. How does this then impact interoperability? In theory it is horrible. In practice it works out pretty well.
I took each of the 6 spreadsheet documents and opened each one in each of the other 5 applications — 30 interoperability tests — to see whether the formulas were loaded and calculated correctly. Here is what I saw:
Created In | |||||||
CleverAge | KSpread | Symphony | OpenOffice | Sun Plugin | |||
---|---|---|---|---|---|---|---|
Read In |
CleverAge | OK | OK | Fail | Fail | OK | OK |
OK | OK | OK | OK | OK | OK | ||
KSpread | OK | OK | OK | OK | OK | OK | |
Symphony | OK | OK | OK | OK | OK | OK | |
OpenOffice | OK | OK | OK | OK | OK | OK | |
Sun Plugin | OK | OK | OK | OK | OK | OK |
So the formulas came through OK, in almost all instances. The only exception was the CleverAge add-in, which failed to process formulas from KSpread and Symphony. For example, loading the Symphony spreadsheet into Office 2003 results in cells with contents containing errors such as “=#REF!+#REF!-#REF!” which is tantamount to data loss.
I think we can do better than this with a few simple changes.
The Law of Robustness as stated in RFC 1122 is “Be liberal in what you accept, and conservative in what you send.” Adapting that principle to ODF spreadsheets, I recommend the following practice for ensuring interoperability using ODF 1.0 and ODF 1.1:
- When writing ODF 1.0 or ODF 1.1 spreadsheet documents, write formula attribute values using the OpenOffice namespace prefix: “http://openoffice.org/2004/calc”. All ODF spreadsheet applications I have tested accept and correctly process formulas in that namespace. Note that the CleverAge add-in is not doing the namespace checks in a XML-correct fashion. They are comparing only the text of the prefix, not resolving it to a namespace URI and comparing the URI’s. So you should be sure to also use “oooc” as the namespace prefix.
- When reading ODF 1.0 or ODF 1.1 spreadsheet documents, be prepared to handle formulas with no namespace qualification as well as those with the OpenOffice namespace.
Specifically, Symphony and KSpread should consider making changes to accommodate #1 and CleverAge should consider changes needed to do #2. In the CleverAge case, a trivial, one-line change to OdfConditionalPostProcessor.cs will quickly restore compatibility with Symphony and KSpread documents.
Now, if you are entirely satisfied with what I have said above, and have no lingering doubts, then you are not thinking enough. It is not enough to merely bring the spreadsheet formulas over intact. Interoperability also requires that we interpret the formulas in the same way.
So let’s look at that side of the equation (no pun intended). Fortunately, we are all quite close to what is being defined in ODF 1.2’s OpenFormula specification. This is not so surprising, since OpenFormula was based on actual spreadsheet practice, looking at a variety of spreadsheet applications. I did a quick test of the 6 ODF spreadsheet applications to see how well they fared against a test suite of 509 core tests that OpenFormula defines for spreadsheet functions. The results were:
- CleverAge 455/509 = 89%
- Google 457/509 = 90%
- KSpread 472/509 = 93%
- Symphony 487/509 = 96%
- OpenOffice 493/509 = 97%
- Sun Plugin 500/509 – 98%
So, we’re not yet perfect, but we’re getting pretty close. Interestingly, the lowest scores (CleverAge) and highest scores (Sun Plugin) are both for the same calculation engine (Excel).
Looking forward, we’ll continue to edit and refine OpenFormula and its test cases. You might look for it when it comes out for public review, hopefully in a couple of months. Unlike other parts of ODF 1.2, OpenFormula is essentially XML-free. It is a mini-expression language, defined by a BNF grammar and accompanied by hundreds of spreadsheet functions from mathematics, finance, engineering, statistics, etc. So review by subject matter experts in these disciplines is especially needed, even if they have zero XML experience. If you want to see the current OpenFormula Working Draft, currently in its 71st revision, take a look. Comments may be submitted to the ODF TC’s comment list.
I’m also looking forward to testing Office 2007 SP2’s ODF support when it comes out, to see how their ODF support is improving. Anything less than the 500/509 results that Excel 2003 gives with the Sun Plugin will be a disappointment. KOffice has a 2.0 version in beta I should look at. OpenOffice has their 3.0 update. Sun also has an updated ODF Plugin. I’ll lean on the Symphony team as well, and see if we can beat 500/509. Game on!
Could you post a link to the OpenFormula function specifications? I can review the text of the mathematical ones, and of course when the public review version comes out I can submit any issues.
Rob, could you please provide some details about how you performed the 509 core tests that OpenFormula defines for spreadsheet functions ?
Did you used ODS files making use of the OpenFormula namespace (prefix of.) ? Do the 6 applications already recognise this namespace ?
Hmm, did you perhaps test the Sun ODF plugin version 3.0 ?
That might explain the highest score, since it’s based on the OOo 3 codebase, beating 2.4 and other products based upon 2.x…
Since Koffice 2 is in the works, and the CleverAge team is also working towards a new version of their plug-in, we have a healthy competition going on :-)
@Tom, I’ll add a link to the current OpenFormula Working Draft at the bottom of the main post.
@Luc, If you look at the OpenFormula draft you will see inline test cases defined for each formula. Some smart person, i.e., not me, wrote a program to extract these test cases and make a self-testing ODF spreadsheet document. It puts the formulas in one column, the expected results in a second column and uses a third column to indicate whether the test case passed. The test cases used the OpenOffice namespace. I haven’t tried the ODF 1.2 namespace. If I did that I’d also want to be more careful about using the latest versions of the application, e.g., The KOffice 2 beta, or OpenOffice 3.01.
@Bart, I’m using the Sun ODF Plugin 1.2. If version 3 is out already, then I’m using an old version. I’ll try the new one.
Word of caution: “Be conservative in what you do, be liberal in what you accept from others” is only good if you add your hacks to the standard. Else it’ll create madness. This already happended with Web (HTML5 ties to fix it, but it’s HARD) – no need to repeat this fiasco with office suites. Joel has nice explanation:
http://www.joelonsoftware.com/items/2008/03/17.html
Basically the problem is that if few players (especially dominant players) accept broken, not standard-compliant input it makes standard useless: at this point it’s not enough to faithfully implement existing standard – you must process all quirks too!
That’s why all “broad inputs” MUST be discussed with standards group, documented and added to the standard (may be as prefix).
@Victor,
I think we’re in a different situation than with HTML. We don’t have 10 million people hand-authoring ODF documents. We have 5 or 6 different spreadsheet editors, and the number of different namespaces in use for spreadsheet formulas appears to be only 2. So I think the Robustness Principle is appropriate here. But your point (Joel’s point) is well-taken. It can get out of control if you let it.
On a related note, the ODF-Converter project posted a blog item on how they will deal with formulas in their next version of the plug-in:
http://odf-converter.sourceforge.net/newblog/index.php?2009/03/13/28-how-the-openxml-odf-translator-deals-with-formulas
Rob, would you please let us know your opinion on the following practical matter?
We are in March 2009 and MS Office 2007 SP2 ia announced for mid-April. So beta versions should exist…
As announced, Microsoft will (hopefully) support ODF in SP2
When SP2 is officially launched, what should a regular user do? Did you perform tests on this SP2? Can we RELIABLY rely on SP2 for opening/creating ODf documents or are they some proprietary “extensions” already?
The vast majority of MS Office users – should they still need third-party plugins or…
IMHO, this topic is so important for all the community that it worth a separate discussion on your blog…
Thanks A LOT,
Răzvan
Hi Răzvan,
When Office 2007 SP2 comes out I will certainly examine it and do a blog post on what I find out. I have heard that there is a beta version of it, but this appears to be a private beta and I do not have access to it.