David Wheeler, the chair of the OASIS ODF Formula Subcommittee has a good status update on our work defining the details of the expression language and supporting functions used in spreadsheet formulas. I’d also like to point out some cool work by Daniel Carrera, who put together some code that post-processes the OpenFormula specification (in ODF format, of course), extracts the details of the embedded test cases, then automatically generates an ODF spreadsheet file which executes the spreadsheet functions and verifies correct results. This resulting spreadsheet allows an implementation to automatically test their compliance to the spec. This gives us a self-testing specification, a great labor savings, as well as a demonstration of the innovative things you can do with ODF. Details are here.
(I note in passing that although the OASIS ODF TC does all of its working documents in ODF format, the Ecma TC45 does none of its working documents in OOXML. They continue to use the old proprietary Microsoft binary formats as their working format on the TC. A suggestion — If they are unable for some reason to use OOXML, then I encourage TC45 to use ISO ODF. They can then download Daniel’s code to help generate test cases from their spreadsheet formula documentation and this, I promise you, will save implementors a lot of time.)
Of course, malcontents will never be pleased by our progress, and will portray this progress as proof that we are yet imperfect, and therefore not useful. The first point is obvious, but the second is dubious.
Stephen McGibbon’s blog entry of a couple weeks ago seems to be the Urquelle of this particular line of reasoning. Here’s one small quote:
I mentioned that in my opinion, Sun were completely aware that ODF wasn’t sufficiently defined to support spreadsheet interoperability as long ago as February 2005 and that the realpolitik inside OASIS was to take advantage of the EU IDA’s request to standardise by rushing to be first despite knowing the ODF specification was deficient in at least this area.
Read the rest of his article and you’ll walk away with two misconceptions:
- The lack of a spreadsheet formula definition in a file format documentation is unusual, defective and prevents interoperability
- Spreadsheet formulas were left out because ODF standardization was rushed, for political reasons
Let’s take a look at each of these in turn.
First, let’s look at the state of the art in spreadsheet file format documentation over the years, with particular attention to how spreadsheet formulas have been documented. As the following table shows, Excel formulas have never been publicly specified, even though Microsoft has been producing file format documentation for various binary, HTML, XHTML and XML Excel formats for over 9 years. It was only after the ODF TC decided to document our spreadsheet formulas and formed a Subcommittee to do so that Ecma TC45 decided to follow. The FUD followed soon after.
Date | Format version | Formula status |
---|---|---|
1997 | Excel 97 Developers Kit (Microsoft Press, 1997) | not defined |
ca 1998 | MSDN CD’s in this era had Office file format documentation | not defined |
Jan 1999 | Office 2000’s XHTML formats for Excel | not defined |
May 2001 | Office XP’s XMLSS format for spreadsheets | not defined |
Nov 2003 | Office 2003’s XML Schemas | not defined |
Dec 2005 | Microsoft submits initial “base document” to Ecma | not defined |
January 2006 | Ecma TC45’s Working Draft 1.1 | not defined |
February 2006 | The OASIS ODF Formula Subcommittee is formed to add formula definition to the ODF specification | |
April 2006 | Ecma TC45’s Working Draft 1.2 | not defined |
May 2006 | Ecma TC45’s Working Draft 1.3 | Mirabile dictu! After 9 years of ignoring it, Microsoft finally decides to start defining their spreadsheet formula language. |
So the statement that the lack of a formula language specification is unusual or makes interoperability impossible falls down in the face of 9 years of contrary evidence. Over the years, the industry has managed to have interoperable spreadsheet formulas between different versions of Office as well as between Excel and competing spreadsheets, including 1-2-3, Quattro Pro, OpenOffice, StarOffice, etc., all without ever having a formula specification.
Even though every other spreadsheet file format specification in the past decade failed to document a spreadsheet formula language, the ODF TC knew that we could and should do better. That is why we took the lead and formed a Subcommittee to define, in great detail, with test cases, how spreadsheet formulas, expressions and functions should be interpreted. This is not fixing a problem. This is advancing the state of the art in file format specifications.
They say that imitation is the sincerest form of flattery. If so the ODF community should be blushing with all of this flattery heaped on it. If it wasn’t for the continual market pressure that our innovations bring, Microsoft would never have 1) issued a patent covenant for OOXML, 2)brought OOXML before a standards body, 3) started to document their spreadsheet formula language or 4) started to create an ODF Add-in for Office.
So, then what about the statement that ODF was rushed through the standardization process?
Let’s look at the numbers. Both ODF and OOXML are derived from pre-existing formats . This is not necessarily a bad thing. This is one source of “implementation experience” and this is beneficial to any standard to have this. But only once the “base document” is submitted to a multi-vendor open standards development organization (SDO) does the true work of standardization begin, including deep technical review of the specification to confirm completeness, conciseness, lack of ambiguity, correct use of formal specification language, ensuring platform independence, encourage flexibility and extensibility, etc. So, I’ll start the clock when the base specification is submitted to the SDO, and stop the clock when the SDO approves the standard.
The ODF numbers are clear enough since the 1.0 version is complete. The OOXML numbers require some estimation, since they are not complete, but I’ll justify my estimates this way:
- The OOXML Working Draft 1.3 is currently 4,081 pages long. At the SC34 meeting in June we were told by the Ecma Secretary General that more material was coming and that this draft was only 2/3 complete. By my calculations, this gives a final size estimate of around 6,000 pages.
- Predicting the completion date is harder. But we do know that Ecma specifications can only be approved twice a year at Ecma General Assembly which are in June and December. If I were Microsoft I’d really really really want OOXML approved in time for the Office 2007 launch, so I’m predicting Ecma approval will be sought at the December Ecma General Assembly.
Of course I could be wrong on either or both of those estimates, but let’s see where the logic takes us. The following table summarizes the time under standardization as well as the rate of standardization (pages/day) for each specification.
Standard | Submitted to SDO | Standard issued | Days elapsed | Standard length | Rate of work |
---|---|---|---|---|---|
ODF | 12 Dec 2002 | 1 May 2005 | 867 | 706 pages | 0.8 pages/day |
OOXML | 15 Dec 2005 | 31 Dec 2006 (est) | 381 (est) | 6000 pages (est) | 15.6 pages/day (est) |
Now I ask you, who is rushing? ODF took 2 ½ years to standardize 700 pages. Microsoft is trying to standardize a 6,000 page behemoth in just 1 year. I think the argument that ODF was rushed through under political pressure just doesn’t stand up to even cursory examination. Honestly, I think this FUD is being spread around as a smoke screen to hide the fact that OOXML is the one that is really being rushed.