• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for ODF

ODF

The Battle for ODF Interoperability

2009/05/17 By Rob 33 Comments

Last year, when I was socializing the idea of creating the OASIS ODF Interoperability and Conformance TC, I gave a presentation I called “ODF Interoperability: The Price of Success”. The observation was that standards that fail never need to deal with interoperability. The creation of test suites, convening of multi-vendor interoperability workshops and plugfests is a sign of a successful standard, one which is implemented by many vendors, one which is adopted by many users, one which has vendor-neutral venues for testing implementations and iteratively refining the standard itself.

Failed standards don’t need to work on interoperability because failed standards are not implemented. Look around you. Where are the OOXML test suites? Where are the OOXML plugfests? Indeed, where are the OOXML implementations and adoptions? Microsoft Office has not implemented ISO/IEC 29500 “Office Open XML”, and neither has anyone else. In one of the great ironies, Microsoft’s escapades in ISO have left them clutching a handful of dust, while they scramble now to implement ODF correctly. This is reminiscent of their expensive and failed gamble on HD DVD on the XBox, followed eventually by a quick adoption of Blue-ray once it was clear which direction the market was going. That’s the way standards wars typically end in markets with strong network effects. They tend to end very quickly, with a single standard winning. Of course, the user wins in that situation as well. This isn’t Highlander. This is economic reality. This is how the world works.

Although this may appear messy to an outside observer, our current conversation on ODF interoperability is a good thing, and further proof, to use the words Microsoft’s National Technology Director, Stuart McKee, that “ODF has clearly won“.

Fixing interoperability defects is the price of success, and we’re paying that price now. The rewards will be well worth the cost.

We’ve come very far in only a few years. First we had to fight for even the idea and acceptance of open standards, in a world dominated by a RAND view of exclusionary standards created in smoke filled rooms, where vendors bargained about how many patents they could load up a standard with. We won that battle. Then we had to fight for ODF, a particular open standard, against a monopolist clinging to its vendor lock-in and control over the world’s documents. We won that battle. But our work doesn’t end here. We need to continue the fight, to ensure that users of document editors, you and I, get the full interoperability benefits of ODF. Other standards, like HTML, CSS, EcmaScript, etc., all went through this phase. Now it is our turn.

With an open standard, like ODF, I own my document. I choose what application I use to author that document. But when I send that document to you, or post it on my web site, I do so knowing that you have the same right to choose as I had, and you may choose to use a different application and a different platform than I used. That is the power of ODF.

Of course, the standard itself, the ink on the pages, does not accomplish this by itself. A standard is not a holy relic. I cannot take the ODF standard and touch it to your forehead say “Be thou now interoperable!” and have it happen. If a vendor wants to achieve interoperability, they need to read (and interpret) the standard with an eye to interoperability. They need to engage in testing with other implementations. And they need to talk to their users about their interoperability expectations. This is not just engineering. Interoperability is a way of doing business. If you are trying to achieve interoperability by locking yourself in a room with a standard, then you’ll have as much luck as trying to procreate while locked in a room with a book on human reproduction. Interoperability, like sex, is a social activity. If you’re doing it alone then you’re doing it wrong.

Standards are written documents — text — and as such they require interpretation. There are many schools of textual interpretation: legal, literary, historic, linguistic, etc. The most relevant one, from the perspective of a standard, is what is called “purposive” or “commercial” interpretation, commonly applied by judges to contracts. When interpreting a document using an purposive view, you look at the purpose, or intent, of a document in its full context, and interpret the text harmoniously with that intent. Since the purpose of a standard is to foster interoperability, any interpretation of the text of a standard which is used to argue in favor of, or in defense of, a non-interoperable implementation, has missed the mark. Not all interpretations are equal. Interpretations which are incongruous with the intent of standardization can easily be rejected.

Standards can not force a vendor to be interoperable. If a vendor wishes deliberately to withhold interoperability from the market, then they will always be able to do so, and, in most cases, devise an excuse using the text of the standard as a scapegoat.

Let’s work through a quick example, to show how this can happen.

OpenFormula is the part of ODF 1.2 that defines spreadsheet formulas. The current draft defines the addition operator as:

6.3.1 Infix Operator “+”

Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds numbers together.

I think most vendors would manage to make an interoperable implementation of this. But if you wanted to be incompatible, there are certainly ways to do so. For example, given the expression “1+1” I could return “42” and still claim to be interoperable. Why? Because the text says “adds numbers together”, but doesn’t explicitly say which numbers to add together. If you decided to add 1 and 41 together, you could claim to be conformant. OK, so let’s correct the text so it now reads:

6.3.1 Infix Operator “+”

Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds Left to Right.

So, this is bullet-proof now, right? Not really. If I want to, I can say that 1+1 =10, if I want to claim that my implementation works in base 2. We can fix that in the standard, giving us:

6.3.1 Infix Operator “+”

Summary: Add two numbers.
Syntax: Number Left + Number Right, both in base 10 representations
Returns: Number, in base 10
Constraints: None
Semantics: Adds Left to Right.

Better, perhaps. But if I want I can still break compatibility. For example, I could say 1+1=0, and claim that my implementation rounds off to the nearest multiple of 5. Or I could say that 1+1 = 1, claiming that the ‘+’ sign was taken as representing the logical disjunction operator rather than arithmetic addition. Or I could do addition modulo 7, and say that the text did not explicitly forbid that. Or I could return the correct answer some times, but not other times, claiming that the standard did not say “always”. Or I could just insert a sleep(5000) statement in my code, and pause 5 seconds every time the an addition operation is performed, making a useless, but conformant implementation And so on, and so on.

The old adage holds, “It is impossible to make anything fool- proof because fools are so ingenious.” A standard cannot compel interoperability from those who want resist it. A standard is merely one tool, which when combined with others, like test suites and plugfests, facilitates groups of cooperating parties to achieve interoperability.

Now is the time to achieve interoperability among ODF implementations. We’re beyond kind words and empty promises. When Microsoft first announced, last May, that it would add ODF support to Office 2007 SP2, they did so with many fine words:

  • “Microsoft Corp. is offering customers greater choice and more flexibility among document formats”
  • Microsoft is “committed to work with others toward robust, consistent and interoperable implementations”
  • Chris Capossela, senior vice president for the Microsoft Business Division: “We are committed to providing Office users with greater choice among document formats and enhanced interoperability between those formats and the applications that implement them”
  • “Microsoft recognizes that customers care most about real-world interoperability in the marketplace, so the company is committed to continuing to engage the IT community to achieve that goal when it comes to document format standards”
  • Microsoft will “work with the Interoperability Executive Customer Council and other customers to identify the areas where document format interoperability matters most, and then collaborate with other vendors to achieve interoperability between their implementations of the formats that customers are using today. This work will continue to be carried out in the Interop Vendor Alliance, the Document Interoperability Initiative, and a range of other interoperability labs and collaborative venues.”
  • “This work on document formats is only one aspect of how Microsoft is delivering choice, interoperability and innovative solutions to the marketplace.”

So the words are there, certainly. But what was delivered fell far, far short of what they promised. Excel 2007 SP2 strips out spreadsheet formulas when it reads ODF spreadsheets from every other vendor’s spreadsheets, and even from spreadsheets created by Microsoft’s own ODF Add-in for Excel. No other vendor does this. Spreadsheet formulas are the very essence of a spreadsheet. To fail to achieve this level of interoperability calls into question the value and relevance of what was touted as an impressive array of interoperability initiatives. What value is an Interoperability Executive Council, an Interop Vendor Alliance, a Document Interoperability Initiative, etc., if they were not able to motivate the most simple act: taking spreadsheet formula translation code that Microsoft already has (from the ODF Add-in for Office) and using it in SP2?

The pretty words have been shown to be hollow words. Microsoft has not enabled choice. Their implementation is not robust. They have, in effect, taken your ODF document, written by you by your choice in an interoperable format, with demonstrated interoperability among several implementations, and corrupted it, without your knowledge or consent.

There are no shortage of excuses from Redmond. If customers wanted excuses more than interoperability they would be quite pleased by Microsoft’s prolix effusions on this topic. The volume of text used to excuse their interoperability failure, exceeds, by an order of magnitude, the amount of code that would be required to fix the problem. The latest excuse is the paternalistic concern expressed by Doug Mahugh, saying that they are corrupting spreadsheets in order to protect the user. Using a contrived example, of a customer who tries to add cells containing text to those containing numbers, Doug observes that OpenOffice and Excel give different answers to the formula = 1+ “2”. Because all implementations do not give the same answer, Microsoft strips out formulas. Better to be the broken clock that reads the correct time twice a day, than to be unpredictable, or as Doug puts it:

If I move my spreadsheet from one application to another, and then discover I can’t recalculate it any longer, that is certainly disappointing. But the behavior is predictable: nothing recalculates, and no erroneous results are created.

But what if I move my spreadsheet and everything looks fine at first, and I can recalculate my totals, but only much later do I discover that the results are completely different than the results I got in the first application?

That will most definitely not be a predictable experience. And in actual fact, the unpredictable consequences of that sort of variation in spreadsheet behavior can be very consequential for some users. Our customers expect and require accurate, predictable results, and so do we. That’s why we put so much time, money and effort into working through these difficult issues.

This bears a close resemblance to what is sometimes called “Ben Tre Logic”, after the Vietnamese town whose demise was excused by a U.S. General with the argument, “It became necessary to destroy the village in order to save it.”

Doug’s argument may sound plausible at first glance. There is that scary “unpredictable consequences”. We can’t have any of that, can we? Civilization would fall, right? But what if I told you that the same error with the same spreadsheet formula occurs when you exchange spreadsheets in OOXML format between Excel and OpenOffice? Ditto for exchanging them in the binary XLS format. In reality, this difference in behavior has nothing to do with the format, ODF or OOXML or XLS. It is a property of the application. So, why is Microsoft not stripping out formulas when reading OOXML spreadsheet files? After all, they have exactly the same bug that Doug uses as the centerpiece of his argument for why formulas are stripped from ODF documents. Why is Microsoft not concerned with “unpredictable consequences” when using OOXML? Why do users seem not to require “accurate, predictable results” when using OOXML? Or to be blunt, why is Microsoft discriminating against their own paying customers who have chosen to use ODF rather than OOXML? How is this reconciled with Microsoft’s claim that they are delivering “choice, interoperability and innovative solutions to the marketplace”?

Filed Under: Interoperability, ODF Tagged With: ODF, Office, OOXML, OpenFormula

A follow-up on Excel 2007 SP2’s ODF support

2009/05/07 By Rob 36 Comments

Wow. My previous post seems to have attracted some attention. When I woke up on Monday morning, made my coffee and logged into to my email, I found out that my geeky little analysis of Office 2007 SP2’s ODF support had sparked some interest. I did not intend it to be more than an update for the handful of the “usual suspects” who regularly follow ODF issues via various blogs, many of which you see listed to your right. If I had any foreknowledge or expectation that this post would end up being on SlashDot, GrokLaw, ZDnet, IDG, Reuters, CNet, etc., I would have done a better job spell checking, and maybe toned down the rhetoric a little (just a little).

But this widespread interest in the topic tells me one thing: ODF is important. People care about it. People want it to succeed, and when this success is threatened, whether for deliberate or accidental reasons, they are upset. Although Office 2007 SP2 also added PDF and XPS support, you don’t see many stories on that at all.

I’ve been trying to respond to the many comments by anonymous FUDsters and Fanboys on various web sites where my post is being discussed. However, it is getting rather laborious swatting all the gnats. They obviously breed in stagnant waters, and there is an awful lot of that on the web. Since all links lead back here anyways, it will be much simpler to do a recap here and address some of the more widespread errors.

The talking points from Redmond seem to be consistent, along the lines of:

We did a 100% perfect and conforming implementation of ODF 1.1 to the letter of the standard. If it is not interoperable, then it is the fault of the standard or the other applications or some guy we saw sneaking around back on the night of the fire. In any case, it is not our fault. We just design, write, test and sell software to users, businesses, governments and educational institutions. We have no influence over whether our products are interoperable or not. What effect SP2 has on users or the market — that’s not our concern. Come back in 50 years when you have a 100% perfect standard and maybe we’ll talk.

In other words, all of those Interoperability Directors and Interoperability Architects at Microsoft seem to have (hopefully temporarily) switched into Minimal Conformance Directors and Minimal Conformance Architects, and are gazing at their navels. I hope they did not suffer a reduction in salary commensurate with the reduction in their claimed responsibilities.

In any case, their argument might be challenged on several grounds. First up is the question of whether the ODF documents written by Excel 2007 SP2 indeed conform to the ODF 1.1 standard. This is not a hard question to answer, but please excuse this short technical diversion.

Let’s see what the ODF 1.1 standard says in section 8.1.3 (Table Cell):

Addresses of cells that contain numbers. The addresses can be relative or absolute, see section 8.3.1. Addresses in formulas start with a “[“ and end with a “]”. See sections 8.3.1 and 8.3.1 for information about how to address a cell or cell range.

And the referenced section 8.3.1 further says:

To reference table cells so called cell addresses are used. The structure of a cell address is as follows:

  1. The name of the table.
  2. A dot (.)
  3. An alphabetic value representing the column. The letter A represents column 1, B represents column 2, and so on. AA represents column 27, AB represents column 28, and so on.
  4. A numeric value representing the row. The number 1 represents the first row, the number 2 represents the second row, and so on.
  5. This means that A1 represents the cell in column 1 and row 1. B1 represents the cell in column 2 and row 1. A2 represents the cell in column 1 and row 2.

    For example, in a table with the name SampleTable the cell in column 34 and row 16 is referenced by the cell address SampleTable.AH16. In some cases it is not necessary to provide the name of the table. However, the dot must be present. When the table name is not required, the address in the previous example is .AH16

So, going back to my test spreadsheets from all of the various ODF applications, how do these applications encode formulas with cell addresses:

  • Symphony 1.3: =[.E12]+[.C13]-[.D13]
  • Microsoft/CleverAge 3.0: =[.E12]+[.C13]-[.D13]
  • KSpread 1.6.3: =[.E12]+[.C13]-[.D13]
  • Google Spreadsheets: =[.E12]+[.C13]-[.D13]
  • OpenOffice 3.01: =[.E12]+[.C13]-[.D13]
  • Sun Plugin 3.0: [.E12]+[.C13]-[.D13]
  • Excel 2007 SP2: =E12+C13-D13

I’ll leave it as an exercise to the reader to determine which one of these seven is wrong and does not conform to the ODF 1.1 standard.

Next is the question of the relationship between interoperability and conformance. So we are not building skyscrapers in the air, let’s start with a working definition of interoperability, say that given by ISO/IEC 2382-01, “Information Technology Vocabulary, Fundamental Terms”:

The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units

I think we probably have a better sense of what conformance is. Something conforms when it meets the requirements defined by a standard.

So let’s explore explore the relationship between conformance to a standard and interoperability.

First, does interoperability require a standard? No. There have been interoperable systems without formal standards. For example, there is a degree of interoperability among spreadsheet vendors on the basis of the legacy Excel binary file format (XLS), even though the binary format was never standardized and never defines spreadsheet formulas. Another example is the SAX XML parsing API. Widely implemented, but never standardized. We may call them informal or de facto standards.

Additionally, many standards start out as informal technical agreements and specifications that achieve interoperability among a small group of users, who then move it forward to standardization so that a broader audience can benefit. But the interoperability came first and the formal standard came second. See the history of the Atom syndication format for a good example.

Second, Is interoperability possible in the presence of non-conformance? Yes. For example, it is well known that the vast majority of web pages (93% by one estimate) on the web today do not conform to the HTML standard. But there is a not unsubstantial degree of interoperability on the web today in spite of this lack of conformance. Generally, interoperability does not require perfection. It requires good faith and hard work. If perfection were required, nothing would work in this world, would it?

Third, if a standard does not define something (like spreadsheet formulas) then I am allowed to do whatever I want, right? This is true. But further, even if ODF 1.1 did define spreadsheet formulas you would still be allowed to do whatever you want. Remember, these are voluntary standards. We can’t force you to do anything, whether we define it or not.

So what then is the precise relationship between conformance and interoperability? I’d state it as:

  • In general, conformance is neither necessary nor sufficient for to achieve interoperability.
  • But interoperability is most efficiently achieved by conformance to an open standard where the standard clearly states those requirements which must be met to achieve interoperability.

In other words, the relationship is due to the efficiency of this configuration to those who wish to interoperate. Conformance is neither necessary nor sufficient to achieve interoperability in general, but interoperability is most efficiently achieved when conformance guarantees interoperability. When I talk about “standards-based interoperability” I’m talking about the situation when you are in the neighborhood of that optimal point.

The inefficiency of other orientations is seen with HTML and Web browsers. Because of the historically low level of HTML conformance by authoring tools and users who hand-edit HTML, browsers today are much more complex then they would otherwise need to be. They need to handle all sorts of mal-formed HTML documents. This complexity extends to any tool that needs to process HTML. Sure, we have a pretty good grip on this now, with tools like HTML Tidy and other robust parsers, but this has come at a cost. Complexity eats up resources, both to coders and testers, but also runtime resources, memory and processing cycles. More complex code is harder to maintain and secure and tends to have more bugs. Greater conformance would have lead to a more efficient relationship between conformance and interoperability.

Similarly, the many years of non-conformance in browsers, most notably Internet Explorer, to the CSS2 standard has resulted in an inefficiency there. From the perspective of web designers, tool authors and competing browser vendors, the lack of conformance to the standards has increased the cost needed to achieve interoperability, a cost transferred from a dominate vendor who chose not to conform to the standards, to other vendors who did conform.

The efficiency of conformance to open standards in particular is the clarity and freedom it provides around access to the standard and the contingent IP rights needed to implement the standard.

So back to ODF 1.1. What is the relationship between conformance and interoperability there? Clearly, it is not yet at that optimal point (which few standards ever achieve) where interoperability is most-efficiently achieved. We’re working on it. ODF 1.2 will be better in that regard than ODF 1.1, and the next version will improve on that, and so on.

Does this mean that you cannot create interoperable solutions with ODF? No, it just means that, like most standards in IT today, you need to do some interoperability testing with other vendor’s products to make sure your product interoperates, and make conformant adjustments to your product in order to achieve real-world nteroperability. Most vendors who don’t have a monopoly would do this naturally and in fact have done this, as my chart indicated. Complaining about this is like complaining about gravity or friction or entropy. Sure, it sucks. Deal with it. Although it may not pay as much as being a professional mourner, work as a programmer is more regular. And giving value to customers will always bring more satisfaction than than standing there weeping about how code is hard.

In any case, this comes down to why do you implement a standard. What are your goals? If your goal is be interoperable, then you perform interoperability testing and make those adjustments to your product necessary to make it be both conformant and interoperable. But if your goal is to simply fulfill a checkbox requirement without actually providing any tangible customer benefit, then you will do as little as needed. However, if your goal is to destroy a standard, then you will create a non-conformant, non-interoperable implementation, automatically download it to millions of users and sow confusion in the marketplace by flooding it with millions of incompatible documents. It all depends on your goals. Voluntary standards do not force, or prevent, one approach or another.

To wrap this up, I stand on the table of interoperability results in the previous post. SP2 has reduced the level of interoperability among ODF spreadsheets, by failing to produce conforming ODF documents, and failing to take note of the spreadsheet formula conventions that had been adopted by all of the other vendors and which are working their way through OASIS as a standard.

If we note the arguments used by Microsoft in the recent past, they have argued that OOXML must be exactly what it is — flaws and all — in order to be compatible with legacy binary Office documents. Then they argued that OOXML can not be changed in ISO, because that would create incompatibility with the “new legacy” documents in Office 2007 XML format. But when it comes to ODF, they have disregarded all legacy ODF documents created by all other ODF vendors and take an aloof stance that looks with disdain on interoperability with other vendor’s documents, or even documents produced by their own ODF Add-in. The sacrosanctness of legacy compatibility appears to be reserved, for strategic reasons, for some formats but not others. We’ll redefine the Gregorian calender in ISO to be interoperable with one format if we need to, but we won’t deign, won’t stoop, won’t dirty ourselves to use the code we already have from the ODF Add-in for Microsoft Office, to make SP2 formulas interoperable with the other vendors’ products, to benefit our own users who are asking for ODF support in Office. As I said before, this ain’t right.

Filed Under: ODF

OpenDocument Format: The Standard for Office Documents

2009/05/05 By Rob Leave a Comment

A belated note that an article of mine on ODF was recently published in IEEE Internet Computing, called “OpenDocument Format: The Standard for Office Documents“. I think it is a good introduction to ODF, what it is, where it came from and why it is important. They allow authors to post a copy on their websites. So feel free to link to it, but any redistribution will need to be negotiated with the publisher.

At the same time I’ve taken the opportunity to put together a new web page of some of my other publications, workshop and conference presentations. I have few others that I want add, once I find them. But this is a start.

Filed Under: ODF

Update on ODF Spreadsheet Interoperability

2009/05/03 By Rob 33 Comments

[2009/05/07 — I’ve posted a follow up article on this topic which you may want to read]

A couple of months ago I did some experiments on the interoperability of ODF spreadsheets, the theory and practice. In that earlier post I looked at the then current ODF implementations, including:

  1. OpenOffice.org 2.4
  2. Google Spreadsheets
  3. KOffice KSpread 1.6.3
  4. IBM Lotus Symphony 1.1
  5. Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
  6. Microsoft Office 2003 with Sun’s ODF Plugin

I created a test document in each of those editors and then loaded each test document in each of the other editors. I showed what worked, what didn’t, and made some suggestions on how interoperability could be improved. I found only two notable failures, when the Microsoft/CleverAge Add-in for Excel loaded KSpread and Symphony documents. The other scenarios I tested were OK:

Created In
CleverAge Google KSpread Symphony OpenOffice Sun Plugin
Read In

 

CleverAge OK OK Fail Fail OK OK
Google OK OK OK OK OK OK
KSpread OK OK OK OK OK OK
Symphony OK OK OK OK OK OK
OpenOffice OK OK OK OK OK OK
Sun Plugin OK OK OK OK OK OK

I lot has happened in the two months since I did that analysis. Several of the applications I tested have been updated:

  • CleverAge has released version 3.0 of their Add-in.
  • OpenOffice 3.01 is now out and in wide use.
  • Symphony 1.3 is now in beta.
  • The Sun ODF Plugin is now at version 3.0.
  • Microsoft Office 2007 SP2 has been released, with integrated ODF support.
  • KOffice 2.0 RC 1 is now available.

I haven’t been able to get the release candidate of KOffice installed, so I’m still including KSpread 1.6.3 in my tests, but for the rest I have created new test files in each editing environment, saved them to ODF format and then loaded the resulting documents into each of the other editors. From these test documents I was able to perform 42 different test combinations.

I’ll explain a bit more how I tested, then give you the table of results, and finally make some observations and recommendations.

The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:

  • Enter the title in A1 “May’s Wedding Planner” and increased font size to 14 point.
  • Enter formula = TODAY() in B3 and set US style MM/DD/YY date format/
  • Enter the date of the wedding as a constant in cell B4, also setting date format.
  • Added simple calculations on cells B6-B8, to calculate days, weeks and months until the wedding.
  • A11 through E16 is a simple ledger of the kind that is done thousands of times a day by spreadsheet users everywhere. Once you have the formula set up in column E (Balance = previous balance + credits – debits) then you can simply copy down the formula to the new row for each new entry.

The resulting spreadsheet looks something like this:

Feel free to download a zip of all of the test spreadsheet files. The file names should be self-explanatory.

Here is what I found when I tested the various scenarios:

Created In
Google KSpread Symphony OpenOffice Sun Plugin CleverAge MS Office 2007 SP2
Read In

 

Google OK OK OK OK Fail OK Fail
KSpread OK OK OK Fail Fail OK Fail
Symphony OK OK OK OK OK Fail Fail
OpenOffice OK OK OK OK OK OK Fail
Sun Plugin OK OK OK OK OK OK Fail
CleverAge Plugin OK OK OK OK Fail OK OK
MS Office 2007 SP2 Fail Fail Fail Fail Fail Fail OK

So what is happening here?

CleverAge appears to have heeded the advice from my earlier blog post and now correctly processes KSpread and Symphony spreadsheets. This is great news and they deserve credit for that work. But this is a small bit of good news in a table that now shows awful lot of red. Let’s see if we can figure this out.

First, some combinations that worked previously, when I tested two months ago, are now not working:

  • Symphony 1.3 beta hangs when attempting to load the spreadsheet created with the CleverAge 3.0 ODF Add-in. Symphony 1.1 also hangs when trying to load that same spreadsheet. However both versions of Symphony work fine when loading the CleverAge 2.5 spreadsheet from two months ago. The CleverAge document appears to be valid, so my guess is that this is a bug in the Symphony 1.3 beta. I’ll pass this document on to the Symphony development team to see what they say.
  • KSpread 1.6.3 does not read formulas from OpenOffice 3.01 documents. KSpread had no problems with OO 2.4 documents. The problem appears to be that OpenOffice 3.01, by default, writes out documents according to the ODF 1.2 draft which puts formulas in the OpenFormula namespace. But KSpread is expecting them in the legacy namespace. The result is that spreadsheet formulas are dropped when the document is loaded in KSpread.
  • In a similar way, Sun’s new ODF Plugin writes out documents according to the ODF 1.2 draft. KOffice is unable to handle these files. This also causes problems for Google Spreadsheets and the Microsoft/CleverAge Plugin for Excel, which report errors “We were unable to upload this document” and “The converter failed to open this file”.

The new entry to the mix is Microsoft Office 2007 SP2, which has added integrated ODF support. Unfortunately this support did not fare well in my tests. The problem appears to be how it treats spreadsheet formulas in ODF documents. When reading an ODF document, Excel SP2 silently strips out formulas. What is left is the last value that cell had, when previously saved.

This can cause subtle and not so subtle errors and data loss. For example, in the test document I presented above, the current date is encoded using the TODAY() spreadsheet function. If the formulas are stripped, then this cell no longer updates, and will return the wrong value. Similarly, if Maya tries to continue her ledger of expenses by copying the formula cells from column E down a row, this will cause incorrect calculations, since there is no longer a formula to copy, so she would just be copying the prior balance. In general, SP2 converts an ODF spreadsheet into a mere “table of numbers” and any calculation logic is lost.

In the other direction, when writing out spreadsheets in ODF format, Excel 2007 SP2 does include spreadsheet formulas but places them into an Excel namespace. This namespace is not what OpenOffice and other ODF applications use. It is not the ODF 1.2 namespace. It isn’t even the OOXML namespace. I have no idea what it is or what it means. Not every ODF application checks the namespace of formulas when loading documents, but the ones that do reject the SP2 documents altogether. And the ones that do not check the namespace try and fail to load a formula since it is syntactically different than what they expected. The applications essentially display a corrupted document that is shows neither the formula nor the value correctly. For example, a SP2 document, loaded in MS Office using the Sun ODF Plugin looks like this:

Similar corruption occurs when loading the Excel 2007 SP2 spreadsheet into KSpread, Symphony and OpenOffice. Google doesn’t import the document at all.

I must admit that I’m disappointed by these results. This is not a step forward compared to where we were two months ago. This is a big step backwards. Spreadsheet interoperability is not hard. This is not rocket science. Everyone knows what TODAY() means. Everyone knows what =A1+A2 means. To get this wrong requires more effort than getting it right. It is especially frustrating when we know that the underlying applications support the same fundamental formula language, or something very close to it, and are tripped up by lack of namespace coordination. Whether it is accidental or intentional I don’t know or care. But I cannot fail to notice that the same application — Microsoft Excel 2007 — will process ODF spreadsheet documents without problems when loaded via the Sun or CleverAge plugins, but will miserably fail when using the “improved” integrated code in Office 2007 SP2. This ain’t right.

I have some suggestions for how to move things forward again. There will be a lot less red on the above table if two simple changes are made:

  1. Sun should write out formulas in ODF 1.1 format, using the legacy “oooc” namespace prefix that the other vendors are using. Remember, the other vendors are using that namespace specifically for compatibility with OO’s ODF documents. This is the current convention. To unilaterally switch, without notice or coordination, to a new namespace, is not cool. When ODF 1.2 is an approved standard, then we all can move there in a coordinated fashion, to cause users minimal inconvenience. But the above table clearly shows the confusion that results if this move is not coordinated. I know OO 3.01 has an option to save in ODF 1.0/1.1 format. IMHO, this setting should be the default. I’m not sure if the Sun Plugin has a similar configuration option, but I hope it does.
  2. In addition to writing out compatible formulas as per the above comments on the Sub Plugin, Microsoft should remove the code in SP2 that causes it to reject every other vendor’s spreadsheet documents. Give the user a warning if you need to, but let them have the choice.

Finally, let me try to anticipate and debunk some of the counter-arguments which might be raised to argue against interoperability.

First, we might hear that ODF 1.1 does not define spreadsheet formulas and therefore it is not necessary for one vendor to use the same formula language that other vendors use. This is certainly is true if your sole goal is to claim conformance. If your business model requires only conformance and not actually achieving interoperability, then I wish you well. But remember that conformance and interoperability are not mutually exclusive options. An application can be conformant to a standard and also be interoperable, if you use the legacy formula namespace and syntax. So the desire to be conformant is not an excuse for not also being interoperable, or at least not a valid excuse. One might also wryly note that Microsoft has several Directors of Interoperability, not Directors of Minimal Conformance, and they workshops are called Document Interoperability Initiatives, not Minimal Conformance Initiatives. The difference between minimal conformance and interoperability is well illustrated in these tests.

Remember, it is not particularly difficult or clever to to take an adverse reading of a standard to make an incompatible, non-interoperable product. Take HTML, for example. It does not define the attributes of unstyled (default) text. So I could create a perfectly conformant browser implementation that makes all default text be 4-point Zapf Dingbats, white text on a white background. It would conform with the standard, but it would be perfectly unusable by anyone. If you try hard enough you can create 100% conformant, but non-interoperable, implementations of almost most standards. Standards are voluntary, written to help coordinate multiple parties in their desires for interoperability. Standards are not written to compel interoperability by parties who do not wish to be interoperable.

(A side point is that SP2’s implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard, but that is another story, for another blog post.)

We might also hear concerns that supporting other vendors’ ODF spreadsheet formulas cannot be done because this formula language is undocumented. The irony here is that the formula language used by OpenOffice (and by other vendors) is based on that used by Excel, which itself was not fully documented when OpenOffice implemented it. So an argument, by Microsoft, not to support that language because it is not documented is rather hypocritical. Excel supports 1-2-3 files and formulas and legacy Excel versions (back to Excel 4.0) neither of which have standardized formula languages. Why are these supported? Also, the fact that the Microsoft/CleverAge add-in correctly reads and writes the legacy ODF formula syntax shows not only that it can be done, but that Microsoft already has the code to do it. The inexplicable thing is why that code never made it into Excel 2007 SP2.

We’ll probably also hear that 100% compatibility with legacy documents is critical to Microsoft users and that it is dangerous to try to save Excel formulas into interoperable ODF formulas because there is no guarantees that OpenOffice or any other ODF application will interpret them the same as Excel does. So one might try to claim that Microsoft is protecting their customers by preventing them from saving interoperable spreadsheet formulas. But we should note that fully-licensed Microsoft Office users have already been creating legacy documents in ODF format, using the Microsoft/CleverAge ODF Add-in. These paying Microsoft Office customers will now see their existing investment in ODF documents, created using Microsoft-sanctioned code, get corrupted when loaded in Excel 2007 SP2. Why are paying Microsoft customers who used ODF less important than Microsoft customers who used OOXML? That is the shocking thing here, the way in which users of the ODF Add-in are being sacrificed.

If you are cynical, you might observe that if Excel 2007 SP2 allowed Microsoft/CleverAge ODF Add-in formulas to work correctly, then SP2 would need to allow all vendors’ formulas to work, since the other vendors are using the same legacy namespace. The only way for Microsoft to make their legacy ODF documents work and to exclude other vendors would be (hypothetically) to specifically look in the document for the name of the application that created the document, and allow their ODF Add-in but reject OpenOffice, etc. IANAL, but I think something like that would look very, very bad to competition authorities. So the only way out, if your goal (hypothetically) is to avoid interoperability, is to sacrifice your existing Office customers who are using the Microsoft/CleverAge ODF Add-in. It serves them right for not sticking to the party line in the first place. This’ll teach ’em good.

Of course, I am not that cynical. I was taught to never assume malice where incompetence would be the simpler explanation. But the degree of incompetence needed to explain SP2’s poor ODF support boggles the mind and leads me to further uncharitable thoughts. So I must stop here.

As I mentioned before, this is a step backwards. But it is just one step on the journey. Let’s look forward (and move forward). This is just code. Code can be fixed. We know exactly what is needed to have good interoperability of spreadsheet formulas. In fact most of the code already exists for this. The only thing we need now is to actually go do it and not get too far ahead, or lag too far behind from the other implementations. This is more a question of timing and coordination than hard technical problems.

[2009/05/07 — For more on this topic, see my “A follow-up on Excel 2007 SP2’s ODF Support“]

Filed Under: Interoperability, ODF

Taking Control of Your Documents

2009/03/24 By Rob 12 Comments

How to free yourself from Microsoft Office dependency in three easy steps

The Objective

When you save a document in your word processor, your work is encoded in a particular file format. You often have a choice of formats that you can use, with names like DOC, DOCX, RTF, WPD or ODT. Your choice of format will influence whether others can easily read your document today, whether you yourself will be able to read your document ten years from now, and whether you will be able to migrate painlessly to another word processor or operating system if and when you choose to do so.

Although many users simply click “Save” and give no thought to which format is being used under the covers, this unthinking use of the word processor’s default settings is a recipe for vendor lock-in. In fact, several vendors intentionally set their default format to be ones which will only work well with their own software, fostering dependency on that vendor’s software and lessening the user’s ability to take advantage of other options in the market. The more documents you save and accumulate in a vendor’s proprietary format, the harder it will be for you to consider any other choices.

The objective of this paper is to show you, the user, how to extricate yourself from this cycle of dependency and take control of your documents. Specifically, we show how you can, in three easy steps, free yourself from a Microsoft Office dependency. In the end you may, of course, choose to remain on Microsoft Office. You may decide to migrate to an alternative word processor. That, in the end, is your choice. But by following the three steps outlined below, your freedom of action will be preserved, and your choice of word processor will be based on your priorities and your needs, and not forced on you by your current application vendor.

Step 1: Take control of the default format

The older versions of Microsoft Office, Office 97-Office 2003), by default save documents in a family of binary formats with the extensions DOC (Word), XLS (Excel) and PPT (PowerPoint). Although these formats are proprietary Microsoft formats, over the past decade 3rd party applications have developed the capability to read and write these formats.

However, starting in Office 2007 Microsoft suddenly switched the default format to something called Office Open XML (OOXML). This format is not widely supported outside of Office 2007. So if you save a document in the OOXML format you make it harder for anyone else to read your document unless they are also using Microsoft Office 2007. In almost all cases, the same document, if saved in the legacy DOC format will be more interoperable. Staying with the default choice, OOXML, only restricts your choices and make you more dependent on Microsoft Office. Of course, that is why Microsoft made OOXML the default format.

The first step to liberate yourself from Microsoft Office dependency is to change the default format in Microsoft Office 2007 away from OOXML and back to the early binary formats supported by Office 97-2003, which are widely supported by 3rd party applications. This is a neutral step that preserves the status quo. By making these changes you will still be able to read and edit any OOXML documents that are sent to you, but all new documents you create will be saved in the more widely supported DOC/XLS/PPT formats.

If you are using Microsoft Office 2003 or earlier, then you should skip this Step and move on to Step 2, since OOXML is not the default format in those earlier Office versions.

To change the defaults, you will need to load Word 2007, Excel 2007 and PowerPoint 2007 and follow the following steps.

Word 2007

  1. Click the Office Button (the unlabeled logo button in the upper left of the program).
  2. Click “Word Options” at the bottom of the dialog.
  3. Go to the “Save” section.
  4. For the “Save files in this format” setting, choose “Word 97-2003 Document(*.doc)”.
  5. Click OK.

Excel 2007

  1. Click the Office Button (the unlabeled logo button in the upper left of the program).
  2. Click “Excel Options” at the bottom of the dialog.
  3. Go to the “Save” section.
  4. For the “Save files in this format” setting choose “Excel 97-2003 Workbook (*.xls)”.
  5. Click OK.

PowerPoint 2007

  1. Click the Office Button (the unlabeled logo button in the upper left of the program).
  2. Click “PowerPoint Options” at the bottom of the dialog.
  3. Go to the “Save” section
  4. For the “Save files in this format” setting, choose “PowerPoint Presentation 97-2003”.
  5. Click OK.

Administrators should also note that these settings may be made directly in the Windows Registry, and automatically pushed out to a work group via a login script or group policy. The registry settings corresponding to the above changes are:

HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Word\Options
Add String DefaultFormat=Doc

HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Excel\Options
Add DWORD DefaultFormat=38 (Hexadecimal)

HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\PowerPoint\Options
Add DWORD DefaultFormat=0 (Hexadecimal)

Step 2: Enable OpenDocument Format Support

Now that you’ve made the first steps towards taking control of your documents by preventing the lock-in effects of the OOXML default, it is time to take further control. You’ll now want to enable OpenDocument Format (ODF=ISO/IEC 26300) support in Microsoft Office, so you can save and exchange documents using the free and open International Standard while remaining in the familiar Microsoft Office interface.

ODF is an XML-based, open document format standard, designed to be platform- and application-neutral and support interoperable use across applications, eliminating vendor lock-in. ODF is supported by many applications, including office suites from Sun, IBM, Novell and Google, as well as open source projects like OpenOffice, KOffice and AbiWord. Additional applications supporting ODF are listed on Wikipedia.

Microsoft Office does not currently support ODF “out of the box”, but you can enable ODF support in Office by installing a “plugin”, sometimes called an “add-in”. A plugin will add additional options or menu items to the Microsoft Office UI, allowing you to open and save documents in ODF format. In some cases you can even set ODF as the default format for new documents.

There are three main choices for adding ODF support to Microsoft Office:

  1. Sun Microsystems has published an “ODF Plugin for Microsoft Office” which supports Office 2000, XP, 2003 and 2007 SP1.
  2. Microsoft has sponsored an open source project on SourceForge for an “ODF Add-in for Microsoft Office”, which supports Office 2007, and also Office 2003 and Office XP if the Microsoft Office Compatibility Pack is also installed
  3. Microsoft has announced that Office 2007 Service Pack 2 (SP2) will enable ODF support in Office 2007, but this code is not yet available.

Step 2 is to evaluate and adopt a plugin to add ODF support to Microsoft Office. Start using ODF now, saving your documents in the open standard document format. This allows you to remain in Office, for now, while building your familiarity and comfort level with ODF.

Step 3: Exercise your Right to Choose a Native ODF Editor

The plug-in approach is a transitional approach. It allows you to continue working in Microsoft Office while you enable ODF support side-by-side. But at some point you will want to consider your options. Maybe you find that converting back and forth to ODF format in MS Office is slow. Maybe you are using Office 2003 currently, but want to avoid paying for an Office 2007 upgrade when mainstream support for Office 2003 comes to an end on April 14th, 2009. At some point you will want to move to an application that supports ODF natively. You are free at this point and have a wide variety of choices.

  • You can stay on Windows or consider moving to Linux or the Mac.
  • You can stay with a traditional client editor, or move to a web based editor.
  • You can use commercial software, or use open source software.

The important thing is that you have taken control of your documents. You are no longer dependent on Microsoft Office and its file format. You have broken free of the vendor lock-in. You are free to choose an alternative word processor when you want to and if you want to. Until then, be comfortable in knowing that you are keeping your options open while remaining in control of your documents.


This paper is also available in ODF and PDF formats.

Filed Under: ODF Tagged With: Microsoft Office, ODF, Open Standards, OpenDocument

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 7
  • Page 8
  • Page 9
  • Page 10
  • Page 11
  • Interim pages omitted …
  • Page 25
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies