Friday, June 26, 2009

ODF Plugfest



Although the term may be alien to some, "plugfests" have been around for around 20 years. A plugfest is when implementors of the same interface get together and test the interoperability of their products. In the beginning this was done with wired standards, USB, etc. (thus 'plug'). Over the years the term was applied to networking at all higher levels of the protocol stack. The concept is also applicable to document exchange formats like ODF.

We had an ODF Plugfest last week in the Hague. Although we've had interoperability workshops and camps before that attracted a handful of vendors, this was the first one that had nearly universal participation from ODF vendors. I'm not going to recap the details of the plugfest. Others have done that already. But I will share with you some of my conclusions, based on long discussions with other participants, from whose insights I have greatly benefited.

In an ideal world, specifications would be perfect and software applications would be bug-free and users would read the manuals and we would achieve perfect interoperability instantly by anointment of the salubrious unction of standardization. But to the extent this planet's population obdurately persists in imperfection, we are resigned to make additional efforts in pursuit of interoperability. We are not alone in this regard. The only standards that don't need to work on interoperability are those standards that no one implements.

We should use every licit technique at our disposal to give the user the best experience with ODF we can. In a competitive market you can not get away with telling your customer, "Sorry, your spreadsheet doesn't work because page 652, clause 23 says 'should' rather than 'shall'". If you did that you would not have that customer for long. (Unless, of course, you have a monopoly, in which case many seemingly irrational, anti-consumer actions can occur, seemingly without consequences.)

Further, I assert:
  1. Users want real-world interoperability, and not excuses
  2. Real-world interoperability is what users see and achieve in practice
  3. Where vendors have the will to interoperate, achieving interoperability is a known technical problem, with known engineering solutions, but where the will to interoperate is lacking, there are no technical means of compelling interoperability
  4. Interoperability lies at the intersection of technology, engineering standards, competition law, intellectual property and economics. There are no silver bullets, although there are a arsenal of proven techniques that can help to improve interoperability
  5. Achieving interoperability is facilitated by a variety of cooperative activities, including standardization, test case creation, implementation testing, online validators, plugfests, defect collection and reporting
Going forward there is a promising constellation of efforts converging around ODF interoperability:

So, we're moving in the right direction. The key thing will be to sustain the momentum from the Plugfest and transition it into an ongoing effort, a Perpetual and Virtual Plugfest where the effort and the progress is continuous.

[6/29/09: I've received some emails on the photo, so here are the details:

The picture was taken at 3:30PM on the 2nd day of the workshop.

The lens was a Pentax DA 10-17mm "fisheye" zoom at 10mm. So that explains the projection distortion. The graininess and B&W was from post-processing using Nik Software's Silver Efex Pro and Sharpener Pro.]

Labels:

Tuesday, June 23, 2009

ODF TC timeline

I used a variation of this chart at the recent ODF Plugfest in the Netherlands. But the aspect ratio of a presentation slide doesn't suit this type of chart well, so here is a fuller version of what I showed there.

Those who are not familiar with standards development are sometimes amazed at how long it takes to develop a good standard. Perhaps the single-vendor, 6,000 page, 12-month escapade of OOXML in Ecma has skewed expectations. Fortunately, OOXML is the exception, not the rule. Achieving a multi-vendor consensus around a substantial technical standard will always be time-consuming, but it is time that is well spent.

OASIS standards go through several stages of development:

  1. Working Draft (WD)
  2. Committee Draft (CD)
  3. Public Review Draft
  4. Committee Specification
  5. OASIS Standard
Progressing from one step to another is by ballot. The first 4 stages are advanced by vote of the Technical Committee (TC), while the last stage (OASIS Standard) is by a ballot of all OASIS members. As a draft advances through stages 1-4, an increasing degree of consensus is required. So, a CD requires only simple majority, whereas a Committee Specification requires 2/3 approval, with no more than 1/4 disapproval. Some of these stages allow iteration. So we can, and typically do, have several WD's and several CD's.

If you want more detail on the nitty-gritty details, here is a flow chart of the OASIS standards approval process.

I occasionally get a question along the lines of: "What has the ODF TC been doing for the past couple of years?" The following timeline should give you an idea. I've indicated the time spent developing ODF 1.0 and ODF 1.1, along with some other milestone activities, such as the PAS transposition of ISO/IEC 26300, the publication of ODF 1.0 Approved Errata 01 and the creation of the various ODF subcommittees. I've also indicated the dates of each of the ODF 1.2 WD's and CD's.

As you can see, we've been quite busy. After iterating on WD's during 2007 and 2008, we've now moved on to CD's. All of the planned feature work for ODF 1.2 is now completed. The remaining work is to address the various editorial and technical comments that have been submitted to our comment list, as well comments from TC members and JTC1/SC34. The goal is to have no known defects in ODF 1.2 before we send it out for a Public Review. Of course, previously-unknown defects will likely be identified during the Public Review, and we have a process for handling these. I'll comment more on that process, and Public Reviews in general, when we get closer to that stage.

Labels:

Tuesday, June 09, 2009

ODF Lies and Whispers

There is an interesting disinformation campaign being waged against ODF. You won't see this FUD splattered across the front pages of blogs or press releases. It is the kind of stuff that is spread by email and whispers, and you or I rarely will see it in the light of day. But occasionally some of it does cross my desk, and I'd like to share with you some recent examples.

First up is this instance, from a small Baltic republic, where a rather large US-based software company was recently arguing to the national standards committee for the adoption of OOXML instead of ODF. Here are some of the points made by this large company in a letter:

There is no software that currently implements ODF as approved by the ISO

(They then link to Alex Brown's comment from Wikipedia). I think this demonstrates the triangle-trade relationship among Microsoft, Alex Brown (and other bloggers) and Wikipedia, by which Microsoft FUD is laundered via intermediaries to Wikipedia for later reference as newly minted "facts". No wonder one of Microsoft's first actions during their OOXML push was to seize control of the Wikipedia articles on ODF and OOXML via paid consultants. In any case, Alex's claims were rebutted long ago.

ODF has a number (more than a hundred) of technical flaws which haven't been addressed for 3 years despite change requests addressed to OASIS by countries such as Japan and United Kingdom. There are discussions between OASIS and ISO/IEC JTC 1 SC 34 regarding true ownership of ISO ODF, which is a reason why the flaws in ISO ODF aren't being addressed. In a recent SC 34 meeting in Prague a new ISO ODF maintenance committee has been formed because ISO / IEC 26300: 2006 is not being presently maintained.

This is not true. First, the ODF TC has received zero defect reports from any ISO/IEC national body other than Japan. Second, we responded to the Japanese defect report last November. Amazingly, Alex Brown is implicated in this FUD one as well. It was false then and it is false now. At the exact time Alex was quoted in the press as saying the the ODF TC was not acting on defect reports (October 8th, 2008), we had in fact already sent our response to the defect report out to public review (August 7th, 2008) and then completed that reivew (August 22nd), after quite a bit of active technical discussion with the submitter of the original defect report (Murata Mokoto). How Alex translated that into "Their defect reports are being shelved" and "Oasis has not been acting on reports of defects" is beyond me. It must be particularly embarrassing that Murata-san wrote to the OASIS list, within days of Alex's FUD, "I am happy with the way that the errata has been prepared." How could Alex be ignorant of these facts? Why was he lying to the press? How is this conformant with his leadership role in JTC1/SC34 and his participation in BSI? Also observe the triangle-trade route of FUD in this case from Alex to Doug Mahugh to Wikipedia, this time for negative edits in the OASIS article.

IBM currently recommends not using OASIS ODF 1.1 and to instead use OASIS ODF 1.2 which is currently not complete and will not be complete and ISO certified before 2010/2011. OASIS on the other hand have started work on ODF 2.0 which will not be backwards compatible.

This is an odd one, demonstrably false. IBM Lotus Symphony supports ODF 1.1. We have no ODF 1.2 support at present. I wonder where they came up with this one? It is totally bizarre. Although we have started to gather requirements for "ODF-Next", the contents of that version, and to what degree it will be backwards compatible, has not even been discussed by the TC, let alone determined. So this is pure FUD, trying to make ODF sound risky to adopt, and then lying about IBM's support for it, and our position on ODF 1.2.

The list goes on, including claims that no one supports ODF 1.0 or ODF 1.1, etc., but you get the gist of it. The particulars are interesting, of course, but more so the reckless disregard for the truth, and the triangle-trade relationship between notable bloggers, Wikipedia, and Microsoft's whisper campaign.

Another current example is part of Microsoft's attempt to duck and cover from criticism over their interoperability-busting ODF support in Office 2007 SP2. I've heard variations on the following from three different people in three different countries, including from government officials. So it is getting around. It goes something like this:

We (Microsoft) wanted to be more interoperable with ODF. In fact we submitted 15 proposals to the ODF TC to improve interoperability, but IBM and Sun voted them down.

Nice story, but not true. Certainly Microsoft submitted 15 proposals. But they were never voted on by the TC, because Microsoft chose not to advance them for a vote. They opted not to have these proposals considered for ODF 1.2. It was their choice alone and their decision alone not to put these items up for a vote. I would have been fine with whatever decision Microsoft wanted to make in this situation. I'm not criticizing their decision. I'm just saying we need to be clear that the outcome was entirely due to their decision, and not to blame IBM or Sun for Microsoft's choice in this matter.

I think I can trace this FUD back to a May 13th blog post from Doug Mahugh where he wrote:

We then continued submitting proposed solutions to specific interoperability issues, and by the time proposals for ODF 1.2 were cut off in December, we had submitted 15 proposals for consideration. The TC voted on what to include in version 1.2, and none of the proposals we had submitted made it into ODF 1.2.

This certainly is an interesting statement. There is nothing I can point to that is false here. Everything here is 100% accurate. However, it seems to be reckless in how it neglects the most relevant facts, namely that the proposals did not make it into ODF 1.2 at Microsoft's sole election. It is as if Lee Harvey Oswald had written a note: "Went to Dallas and saw a parade today. Tried to see a movie, but had to leave early. Heard later on the radio that the President was shot". This would have been 100% accurate as well, but not the "whole truth". In any case, the rundown of the facts in this question are on the TC's mailing list.

So what is one to do? You obviously can't trust Wikipedia whatsoever in this area. This is unfortunate, since I am a big fan of Wikipedia. I want it to succeed. But since the day when Microsoft decided they needed to pay people to "improve" the ODF and OOXML articles, these articles have been a cesspool of FUD, spin and outright lies, seemingly manufactured for Microsoft's re-use in their whisper campaign. My advice would be to seek out official information on the standards, from the relevant organizations, like OASIS, the chairs of the relevant committees, etc. Ask the questions in public places and seek a public, on-the-record response. More people are willing to lie than face of consequences of being caught lying. That is the ultimate weakness of lies. They cannot stand the light of public exposure. Sunlight is the best antiseptic.

Labels:

Sunday, May 17, 2009

The Battle for ODF Interoperability

Last year, when I was socializing the idea of creating the OASIS ODF Interoperability and Conformance TC, I gave a presentation I called "ODF Interoperability: The Price of Success". The observation was that standards that fail never need to deal with interoperability. The creation of test suites, convening of multi-vendor interoperability workshops and plugfests is a sign of a successful standard, one which is implemented by many vendors, one which is adopted by many users, one which has vendor-neutral venues for testing implementations and iteratively refining the standard itself.

Failed standards don't need to work on interoperability because failed standards are not implemented. Look around you. Where are the OOXML test suites? Where are the OOXML plugfests? Indeed, where are the OOXML implementations and adoptions? Microsoft Office has not implemented ISO/IEC 29500 "Office Open XML", and neither has anyone else. In one of the great ironies, Microsoft's escapades in ISO have left them clutching a handful of dust, while they scramble now to implement ODF correctly. This is reminiscent of their expensive and failed gamble on HD DVD on the XBox, followed eventually by a quick adoption of Blue-ray once it was clear which direction the market was going. That's the way standards wars typically end in markets with strong network effects. They tend to end very quickly, with a single standard winning. Of course, the user wins in that situation as well. This isn't Highlander. This is economic reality. This is how the world works.

Although this may appear messy to an outside observer, our current conversation on ODF interoperability is a good thing, and further proof, to use the words Microsoft's National Technology Director, Stuart McKee, that "ODF has clearly won".

Fixing interoperability defects is the price of success, and we're paying that price now. The rewards will be well worth the cost.

We've come very far in only a few years. First we had to fight for even the idea and acceptance of open standards, in a world dominated by a RAND view of exclusionary standards created in smoke filled rooms, where vendors bargained about how many patents they could load up a standard with. We won that battle. Then we had to fight for ODF, a particular open standard, against a monopolist clinging to its vendor lock-in and control over the world's documents. We won that battle. But our work doesn't end here. We need to continue the fight, to ensure that users of document editors, you and I, get the full interoperability benefits of ODF. Other standards, like HTML, CSS, EcmaScript, etc., all went through this phase. Now it is our turn.

With an open standard, like ODF, I own my document. I choose what application I use to author that document. But when I send that document to you, or post it on my web site, I do so knowing that you have the same right to choose as I had, and you may choose to use a different application and a different platform than I used. That is the power of ODF.

Of course, the standard itself, the ink on the pages, does not accomplish this by itself. A standard is not a holy relic. I cannot take the ODF standard and touch it to your forehead say "Be thou now interoperable!" and have it happen. If a vendor wants to achieve interoperability, they need to read (and interpret) the standard with an eye to interoperability. They need to engage in testing with other implementations. And they need to talk to their users about their interoperability expectations. This is not just engineering. Interoperability is a way of doing business. If you are trying to achieve interoperability by locking yourself in a room with a standard, then you'll have as much luck as trying to procreate while locked in a room with a book on human reproduction. Interoperability, like sex, is a social activity. If you're doing it alone then you're doing it wrong.

Standards are written documents -- text -- and as such they require interpretation. There are many schools of textual interpretation: legal, literary, historic, linguistic, etc. The most relevant one, from the perspective of a standard, is what is called "purposive" or "commercial" interpretation, commonly applied by judges to contracts. When interpreting a document using an purposive view, you look at the purpose, or intent, of a document in its full context, and interpret the text harmoniously with that intent. Since the purpose of a standard is to foster interoperability, any interpretation of the text of a standard which is used to argue in favor of, or in defense of, a non-interoperable implementation, has missed the mark. Not all interpretations are equal. Interpretations which are incongruous with the intent of standardization can easily be rejected.

Standards can not force a vendor to be interoperable. If a vendor wishes deliberately to withhold interoperability from the market, then they will always be able to do so, and, in most cases, devise an excuse using the text of the standard as a scapegoat.

Let's work through a quick example, to show how this can happen.

OpenFormula is the part of ODF 1.2 that defines spreadsheet formulas. The current draft defines the addition operator as:

6.3.1 Infix Operator "+"

Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds numbers together.

I think most vendors would manage to make an interoperable implementation of this. But if you wanted to be incompatible, there are certainly ways to do so. For example, given the expression "1+1" I could return "42" and still claim to be interoperable. Why? Because the text says "adds numbers together", but doesn't explicitly say which numbers to add together. If you decided to add 1 and 41 together, you could claim to be conformant. OK, so let's correct the text so it now reads:

6.3.1 Infix Operator "+"

Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds Left to Right.

So, this is bullet-proof now, right? Not really. If I want to, I can say that 1+1 =10, if I want to claim that my implementation works in base 2. We can fix that in the standard, giving us:

6.3.1 Infix Operator "+"

Summary: Add two numbers.
Syntax: Number Left + Number Right, both in base 10 representations
Returns: Number, in base 10
Constraints: None
Semantics: Adds Left to Right.

Better, perhaps. But if I want I can still break compatibility. For example, I could say 1+1=0, and claim that my implementation rounds off to the nearest multiple of 5. Or I could say that 1+1 = 1, claiming that the '+' sign was taken as representing the logical disjunction operator rather than arithmetic addition. Or I could do addition modulo 7, and say that the text did not explicitly forbid that. Or I could return the correct answer some times, but not other times, claiming that the standard did not say "always". Or I could just insert a sleep(5000) statement in my code, and pause 5 seconds every time the an addition operation is performed, making a useless, but conformant implementation And so on, and so on.

The old adage holds, "It is impossible to make anything fool- proof because fools are so ingenious." A standard cannot compel interoperability from those who want resist it. A standard is merely one tool, which when combined with others, like test suites and plugfests, facilitates groups of cooperating parties to achieve interoperability.

Now is the time to achieve interoperability among ODF implementations. We're beyond kind words and empty promises. When Microsoft first announced, last May, that it would add ODF support to Office 2007 SP2, they did so with many fine words:
So the words are there, certainly. But what was delivered fell far, far short of what they promised. Excel 2007 SP2 strips out spreadsheet formulas when it reads ODF spreadsheets from every other vendor's spreadsheets, and even from spreadsheets created by Microsoft's own ODF Add-in for Excel. No other vendor does this. Spreadsheet formulas are the very essence of a spreadsheet. To fail to achieve this level of interoperability calls into question the value and relevance of what was touted as an impressive array of interoperability initiatives. What value is an Interoperability Executive Council, an Interop Vendor Alliance, a Document Interoperability Initiative, etc., if they were not able to motivate the most simple act: taking spreadsheet formula translation code that Microsoft already has (from the ODF Add-in for Office) and using it in SP2?

The pretty words have been shown to be hollow words. Microsoft has not enabled choice. Their implementation is not robust. They have, in effect, taken your ODF document, written by you by your choice in an interoperable format, with demonstrated interoperability among several implementations, and corrupted it, without your knowledge or consent.

There are no shortage of excuses from Redmond. If customers wanted excuses more than interoperability they would be quite pleased by Microsoft's prolix effusions on this topic. The volume of text used to excuse their interoperability failure, exceeds, by an order of magnitude, the amount of code that would be required to fix the problem. The latest excuse is the paternalistic concern expressed by Doug Mahugh, saying that they are corrupting spreadsheets in order to protect the user. Using a contrived example, of a customer who tries to add cells containing text to those containing numbers, Doug observes that OpenOffice and Excel give different answers to the formula = 1+ "2". Because all implementations do not give the same answer, Microsoft strips out formulas. Better to be the broken clock that reads the correct time twice a day, than to be unpredictable, or as Doug puts it:

If I move my spreadsheet from one application to another, and then discover I can’t recalculate it any longer, that is certainly disappointing. But the behavior is predictable: nothing recalculates, and no erroneous results are created.

But what if I move my spreadsheet and everything looks fine at first, and I can recalculate my totals, but only much later do I discover that the results are completely different than the results I got in the first application?

That will most definitely not be a predictable experience. And in actual fact, the unpredictable consequences of that sort of variation in spreadsheet behavior can be very consequential for some users. Our customers expect and require accurate, predictable results, and so do we. That’s why we put so much time, money and effort into working through these difficult issues.

This bears a close resemblance to what is sometimes called "Ben Tre Logic", after the Vietnamese town whose demise was excused by a US General with the argument, "It became necessary to destroy the village in order to save it."

Doug's argument may sound plausible at first glance. There is that scary "unpredictable consequences". We can't have any of that, can we? Civilization would fall, right? But what if I told you that the same error with the same spreadsheet formula occurs when you exchange spreadsheets in OOXML format between Excel and OpenOffice? Ditto for exchanging them in the binary XLS format. In reality, this difference in behavior has nothing to do with the format, ODF or OOXML or XLS. It is a property of the application. So, why is Microsoft not stripping out formulas when reading OOXML spreadsheet files? After all, they have exactly the same bug that Doug uses as the centerpiece of his argument for why formulas are stripped from ODF documents. Why is Microsoft not concerned with "unpredictable consequences" when using OOXML? Why do users seem not to require "accurate, predictable results" when using OOXML? Or to be blunt, why is Microsoft discriminating against their own paying customers who have chosen to use ODF rather than OOXML? How is this reconciled with Microsoft's claim that they are delivering "choice, interoperability and innovative solutions to the marketplace"?

Labels: ,

Thursday, May 07, 2009

A follow-up on Excel 2007 SP2's ODF support

Wow. My previous post seems to have attracted some attention. When I woke up on Monday morning, made my coffee and logged into to my email, I found out that my geeky little analysis of Office 2007 SP2's ODF support had sparked some interest. I did not intend it to be more than an update for the handful of the "usual suspects" who regularly follow ODF issues via various blogs, many of which you see listed to your right. If I had any foreknowledge or expectation that this post would end up being on SlashDot, GrokLaw, ZDnet, IDG, Reuters, CNet, etc., I would have done a better job spell checking, and maybe toned down the rhetoric a little (just a little).

But this widespread interest in the topic tells me one thing: ODF is important. People care about it. People want it to succeed, and when this success is threatened, whether for deliberate or accidental reasons, they are upset. Although Office 2007 SP2 also added PDF and XPS support, you don't see many stories on that at all.

I've been trying to respond to the many comments by anonymous FUDsters and Fanboys on various web sites where my post is being discussed. However, it is getting rather laborious swatting all the gnats. They obviously breed in stagnant waters, and there is an awful lot of that on the web. Since all links lead back here anyways, it will be much simpler to do a recap here and address some of the more widespread errors.

The talking points from Redmond seem to be consistent, along the lines of:
We did a 100% perfect and conforming implementation of ODF 1.1 to the letter of the standard. If it is not interoperable, then it is the fault of the standard or the other applications or some guy we saw sneaking around back on the night of the fire. In any case, it is not our fault. We just design, write, test and sell software to users, businesses, governments and educational institutions. We have no influence over whether our products are interoperable or not. What effect SP2 has on users or the market -- that's not our concern. Come back in 50 years when you have a 100% perfect standard and maybe we'll talk.

In other words, all of those Interoperability Directors and Interoperability Architects at Microsoft seem to have (hopefully temporarily) switched into Minimal Conformance Directors and Minimal Conformance Architects, and are gazing at their navels. I hope they did not suffer a reduction in salary commensurate with the reduction in their claimed responsibilities.

In any case, their argument might be challenged on several grounds. First up is the question of whether the ODF documents written by Excel 2007 SP2 indeed conform to the ODF 1.1 standard. This is not a hard question to answer, but please excuse this short technical diversion.

Let's see what the ODF 1.1 standard says in section 8.1.3 (Table Cell):
Addresses of cells that contain numbers. The addresses can be relative or absolute, see section 8.3.1. Addresses in formulas start with a “[“ and end with a “]”. See sections 8.3.1 and 8.3.1 for information about how to address a cell or cell range.

And the referenced section 8.3.1 further says:

To reference table cells so called cell addresses are used. The structure of a cell address is as follows:

  1. The name of the table.

  2. A dot (.)

  3. An alphabetic value representing the column. The letter A represents column 1, B represents column 2, and so on. AA represents column 27, AB represents column 28, and so on.

  4. A numeric value representing the row. The number 1 represents the first row, the number 2 represents the second row, and so on.

  5. This means that A1 represents the cell in column 1 and row 1. B1 represents the cell in column 2 and row 1. A2 represents the cell in column 1 and row 2.

    For example, in a table with the name SampleTable the cell in column 34 and row 16 is referenced by the cell address SampleTable.AH16. In some cases it is not necessary to provide the name of the table. However, the dot must be present. When the table name is not required, the address in the previous example is .AH16

So, going back to my test spreadsheets from all of the various ODF applications, how do these applications encode formulas with cell addresses:
I'll leave it as an exercise to the reader to determine which one of these seven is wrong and does not conform to the ODF 1.1 standard.

Next is the question of the relationship between interoperability and conformance. So we are not building skyscrapers in the air, let's start with a working definition of interoperability, say that given by ISO/IEC 2382-01, "Information Technology Vocabulary, Fundamental Terms":

The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units

I think we probably have a better sense of what conformance is. Something conforms when it meets the requirements defined by a standard.

So let's explore explore the relationship between conformance to a standard and interoperability.

First, does interoperability require a standard? No. There have been interoperable systems without formal standards. For example, there is a degree of interoperability among spreadsheet vendors on the basis of the legacy Excel binary file format (XLS), even though the binary format was never standardized and never defines spreadsheet formulas. Another example is the SAX XML parsing API. Widely implemented, but never standardized. We may call them informal or de facto standards.

Additionally, many standards start out as informal technical agreements and specifications that achieve interoperability among a small group of users, who then move it forward to standardization so that a broader audience can benefit. But the interoperability came first and the formal standard came second. See the history of the Atom syndication format for a good example.

Second, Is interoperability possible in the presence of non-conformance? Yes. For example, it is well known that the vast majority of web pages (93% by one estimate) on the web today do not conform to the HTML standard. But there is a not unsubstantial degree of interoperability on the web today in spite of this lack of conformance. Generally, interoperability does not require perfection. It requires good faith and hard work. If perfection were required, nothing would work in this world, would it?

Third, if a standard does not define something (like spreadsheet formulas) then I am allowed to do whatever I want, right? This is true. But further, even if ODF 1.1 did define spreadsheet formulas you would still be allowed to do whatever you want. Remember, these are voluntary standards. We can't force you to do anything, whether we define it or not.

So what then is the precise relationship between conformance and interoperability? I'd state it as:
In other words, the relationship is due to the efficiency of this configuration to those who wish to interoperate. Conformance is neither necessary nor sufficient to achieve interoperability in general, but interoperability is most efficiently achieved when conformance guarantees interoperability. When I talk about "standards-based interoperability" I'm talking about the situation when you are in the neighborhood of that optimal point.

The inefficiency of other orientations is seen with HTML and Web browsers. Because of the historically low level of HTML conformance by authoring tools and users who hand-edit HTML, browsers today are much more complex then they would otherwise need to be. They need to handle all sorts of mal-formed HTML documents. This complexity extends to any tool that needs to process HTML. Sure, we have a pretty good grip on this now, with tools like HTML Tidy and other robust parsers, but this has come at a cost. Complexity eats up resources, both to coders and testers, but also runtime resources, memory and processing cycles. More complex code is harder to maintain and secure and tends to have more bugs. Greater conformance would have lead to a more efficient relationship between conformance and interoperability.

Similarly, the many years of non-conformance in browsers, most notably Internet Explorer, to the CSS2 standard has resulted in an inefficiency there. From the perspective of web designers, tool authors and competing browser vendors, the lack of conformance to the standards has increased the cost needed to achieve interoperability, a cost transferred from a dominate vendor who chose not to conform to the standards, to other vendors who did conform.

The efficiency of conformance to open standards in particular is the clarity and freedom it provides around access to the standard and the contingent IP rights needed to implement the standard.

So back to ODF 1.1. What is the relationship between conformance and interoperability there? Clearly, it is not yet at that optimal point (which few standards ever achieve) where interoperability is most-efficiently achieved. We're working on it. ODF 1.2 will be better in that regard than ODF 1.1, and the next version will improve on that, and so on.

Does this mean that you cannot create interoperable solutions with ODF? No, it just means that, like most standards in IT today, you need to do some interoperability testing with other vendor's products to make sure your product interoperates, and make conformant adjustments to your product in order to achieve real-world nteroperability. Most vendors who don't have a monopoly would do this naturally and in fact have done this, as my chart indicated. Complaining about this is like complaining about gravity or friction or entropy. Sure, it sucks. Deal with it. Although it may not pay as much as being a professional mourner, work as a programmer is more regular. And giving value to customers will always bring more satisfaction than than standing there weeping about how code is hard.

In any case, this comes down to why do you implement a standard. What are your goals? If your goal is be interoperable, then you perform interoperability testing and make those adjustments to your product necessary to make it be both conformant and interoperable. But if your goal is to simply fulfill a checkbox requirement without actually providing any tangible customer benefit, then you will do as little as needed. However, if your goal is to destroy a standard, then you will create a non-conformant, non-interoperable implementation, automatically download it to millions of users and sow confusion in the marketplace by flooding it with millions of incompatible documents. It all depends on your goals. Voluntary standards do not force, or prevent, one approach or another.

To wrap this up, I stand on the table of interoperability results in the previous post. SP2 has reduced the level of interoperability among ODF spreadsheets, by failing to produce conforming ODF documents, and failing to take note of the spreadsheet formula conventions that had been adopted by all of the other vendors and which are working their way through OASIS as a standard.

If we note the arguments used by Microsoft in the recent past, they have argued that OOXML must be exactly what it is -- flaws and all -- in order to be compatible with legacy binary Office documents. Then they argued that OOXML can not be changed in ISO, because that would create incompatibility with the "new legacy" documents in Office 2007 XML format. But when it comes to ODF, they have disregarded all legacy ODF documents created by all other ODF vendors and take an aloof stance that looks with disdain on interoperability with other vendor's documents, or even documents produced by their own ODF Add-in. The sacrosanctness of legacy compatibility appears to be reserved, for strategic reasons, for some formats but not others. We'll redefine the Gregorian calender in ISO to be interoperable with one format if we need to, but we won't deign, won't stoop, won't dirty ourselves to use the code we already have from the ODF Add-in for Microsoft Office, to make SP2 formulas interoperable with the other vendors' products, to benefit our own users who are asking for ODF support in Office. As I said before, this ain't right.

Labels:

Tuesday, May 05, 2009

OpenDocument Format: The Standard for Office Documents

A belated note that an article of mine on ODF was recently published in IEEE Internet Computing, called "OpenDocument Format: The Standard for Office Documents". I think it is a good introduction to ODF, what it is, where it came from and why it is important. They allow authors to post a copy on their websites. So feel free to link to it, but any redistribution will need to be negotiated with the publisher.

At the same time I've taken the opportunity to put together a new web page of some of my other publications, workshop and conference presentations. I have few others that I want add, once I find them. But this is a start.

Labels:

Sunday, May 03, 2009

Update on ODF Spreadsheet Interoperability

[2009/05/07 -- I've posted a follow up article on this topic which you may want to read]

A couple of months ago I did some experiments on the interoperability of ODF spreadsheets, the theory and practice. In that earlier post I looked at the then current ODF implementations, including:

  1. OpenOffice.org 2.4
  2. Google Spreadsheets
  3. KOffice KSpread 1.6.3
  4. IBM Lotus Symphony 1.1
  5. Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
  6. Microsoft Office 2003 with Sun's ODF Plugin
I created a test document in each of those editors and then loaded each test document in each of the other editors. I showed what worked, what didn't, and made some suggestions on how interoperability could be improved. I found only two notable failures, when the Microsoft/CleverAge Add-in for Excel loaded KSpread and Symphony documents. The other scenarios I tested were OK:



Created In






CleverAge
Google
KSpread
Symphony
OpenOffice
Sun Plugin

Read In


CleverAgeOK
OK
Fail
Fail
OK
OK

GoogleOK
OK
OK
OK
OK
OK

KSpreadOK
OK
OK
OK
OK
OK


SymphonyOK
OK
OK
OK
OK
OK

OpenOfficeOK
OK
OK
OK
OK
OK

Sun PluginOK
OK
OK
OK
OK
OK


I lot has happened in the two months since I did that analysis. Several of the applications I tested have been updated:
I haven't been able to get the release candidate of KOffice installed, so I'm still including KSpread 1.6.3 in my tests, but for the rest I have created new test files in each editing environment, saved them to ODF format and then loaded the resulting documents into each of the other editors. From these test documents I was able to perform 42 different test combinations.

I'll explain a bit more how I tested, then give you the table of results, and finally make some observations and recommendations.

The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:

The resulting spreadsheet looks something like this:




Feel free to download a zip of all of the test spreadsheet files. The file names should be self-explanatory.

Here is what I found when I tested the various scenarios:



Created In







Google
KSpread
Symphony
OpenOffice
Sun Plugin
CleverAge
MS Office 2007 SP2

Read In


GoogleOK
OK
OK
OK
Fail
OK
Fail

KSpreadOK
OK
OK
Fail
Fail
OK
Fail

SymphonyOK
OK
OK
OK
OK
Fail
Fail


OpenOfficeOK
OK
OK
OK
OK
OK
Fail

Sun Plugin
OK
OK
OK
OK
OK
OK
Fail

CleverAge Plugin
OK
OK
OK
OK
Fail
OK
OK

MS Office 2007 SP2
Fail
Fail
Fail
Fail
Fail
Fail
OK


So what is happening here?

CleverAge appears to have heeded the advice from my earlier blog post and now correctly processes KSpread and Symphony spreadsheets. This is great news and they deserve credit for that work. But this is a small bit of good news in a table that now shows awful lot of red. Let's see if we can figure this out.

First, some combinations that worked previously, when I tested two months ago, are now not working:

The new entry to the mix is Microsoft Office 2007 SP2, which has added integrated ODF support. Unfortunately this support did not fare well in my tests. The problem appears to be how it treats spreadsheet formulas in ODF documents. When reading an ODF document, Excel SP2 silently strips out formulas. What is left is the last value that cell had, when previously saved.

This can cause subtle and not so subtle errors and data loss. For example, in the test document I presented above, the current date is encoded using the TODAY() spreadsheet function. If the formulas are stripped, then this cell no longer updates, and will return the wrong value. Similarly, if Maya tries to continue her ledger of expenses by copying the formula cells from column E down a row, this will cause incorrect calculations, since there is no longer a formula to copy, so she would just be copying the prior balance. In general, SP2 converts an ODF spreadsheet into a mere "table of numbers" and any calculation logic is lost.

In the other direction, when writing out spreadsheets in ODF format, Excel 2007 SP2 does include spreadsheet formulas but places them into an Excel namespace. This namespace is not what OpenOffice and other ODF applications use. It is not the ODF 1.2 namespace. It isn't even the OOXML namespace. I have no idea what it is or what it means. Not every ODF application checks the namespace of formulas when loading documents, but the ones that do reject the SP2 documents altogether. And the ones that do not check the namespace try and fail to load a formula since it is syntactically different than what they expected. The applications essentially display a corrupted document that is shows neither the formula nor the value correctly. For example, a SP2 document, loaded in MS Office using the Sun ODF Plugin looks like this:




Similar corruption occurs when loading the Excel 2007 SP2 spreadsheet into KSpread, Symphony and OpenOffice. Google doesn't import the document at all.

I must admit that I'm disappointed by these results. This is not a step forward compared to where we were two months ago. This is a big step backwards. Spreadsheet interoperability is not hard. This is not rocket science. Everyone knows what TODAY() means. Everyone knows what =A1+A2 means. To get this wrong requires more effort than getting it right. It is especially frustrating when we know that the underlying applications support the same fundamental formula language, or something very close to it, and are tripped up by lack of namespace coordination. Whether it is accidental or intentional I don't know or care. But I cannot fail to notice that the same application -- Microsoft Excel 2007 -- will process ODF spreadsheet documents without problems when loaded via the Sun or CleverAge plugins, but will miserably fail when using the "improved" integrated code in Office 2007 SP2. This ain't right.

I have some suggestions for how to move things forward again. There will be a lot less red on the above table if two simple changes are made:
  1. Sun should write out formulas in ODF 1.1 format, using the legacy "oooc" namespace prefix that the other vendors are using. Remember, the other vendors are using that namespace specifically for compatibility with OO's ODF documents. This is the current convention. To unilaterally switch, without notice or coordination, to a new namespace, is not cool. When ODF 1.2 is an approved standard, then we all can move there in a coordinated fashion, to cause users minimal inconvenience. But the above table clearly shows the confusion that results if this move is not coordinated. I know OO 3.01 has an option to save in ODF 1.0/1.1 format. IMHO, this setting should be the default. I'm not sure if the Sun Plugin has a similar configuration option, but I hope it does.
  2. In addition to writing out compatible formulas as per the above comments on the Sub Plugin, Microsoft should remove the code in SP2 that causes it to reject every other vendor's spreadsheet documents. Give the user a warning if you need to, but let them have the choice.
Finally, let me try to anticipate and debunk some of the counter-arguments which might be raised to argue against interoperability.

First, we might hear that ODF 1.1 does not define spreadsheet formulas and therefore it is not necessary for one vendor to use the same formula language that other vendors use. This is certainly is true if your sole goal is to claim conformance. If your business model requires only conformance and not actually achieving interoperability, then I wish you well. But remember that conformance and interoperability are not mutually exclusive options. An application can be conformant to a standard and also be interoperable, if you use the legacy formula namespace and syntax. So the desire to be conformant is not an excuse for not also being interoperable, or at least not a valid excuse. One might also wryly note that Microsoft has several Directors of Interoperability, not Directors of Minimal Conformance, and they workshops are called Document Interoperability Initiatives, not Minimal Conformance Initiatives. The difference between minimal conformance and interoperability is well illustrated in these tests.

Remember, it is not particularly difficult or clever to to take an adverse reading of a standard to make an incompatible, non-interoperable product. Take HTML, for example. It does not define the attributes of unstyled (default) text. So I could create a perfectly conformant browser implementation that makes all default text be 4-point Zapf Dingbats, white text on a white background. It would conform with the standard, but it would be perfectly unusable by anyone. If you try hard enough you can create 100% conformant, but non-interoperable, implementations of almost most standards. Standards are voluntary, written to help coordinate multiple parties in their desires for interoperability. Standards are not written to compel interoperability by parties who do not wish to be interoperable.

(A side point is that SP2's implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard, but that is another story, for another blog post.)

We might also hear concerns that supporting other vendors' ODF spreadsheet formulas cannot be done because this formula language is undocumented. The irony here is that the formula language used by OpenOffice (and by other vendors) is based on that used by Excel, which itself was not fully documented when OpenOffice implemented it. So an argument, by Microsoft, not to support that language because it is not documented is rather hypocritical. Excel supports 1-2-3 files and formulas and legacy Excel versions (back to Excel 4.0) neither of which have standardized formula languages. Why are these supported? Also, the fact that the Microsoft/CleverAge add-in correctly reads and writes the legacy ODF formula syntax shows not only that it can be done, but that Microsoft already has the code to do it. The inexplicable thing is why that code never made it into Excel 2007 SP2.

We'll probably also hear that 100% compatibility with legacy documents is critical to Microsoft users and that it is dangerous to try to save Excel formulas into interoperable ODF formulas because there is no guarantees that OpenOffice or any other ODF application will interpret them the same as Excel does. So one might try to claim that Microsoft is protecting their customers by preventing them from saving interoperable spreadsheet formulas. But we should note that fully-licensed Microsoft Office users have already been creating legacy documents in ODF format, using the Microsoft/CleverAge ODF Add-in. These paying Microsoft Office customers will now see their existing investment in ODF documents, created using Microsoft-sanctioned code, get corrupted when loaded in Excel 2007 SP2. Why are paying Microsoft customers who used ODF less important than Microsoft customers who used OOXML? That is the shocking thing here, the way in which users of the ODF Add-in are being sacrificed.

If you are cynical, you might observe that if Excel 2007 SP2 allowed Microsoft/CleverAge ODF Add-in formulas to work correctly, then SP2 would need to allow all vendors' formulas to work, since the other vendors are using the same legacy namespace. The only way for Microsoft to make their legacy ODF documents work and to exclude other vendors would be (hypothetically) to specifically look in the document for the name of the application that created the document, and allow their ODF Add-in but reject OpenOffice, etc. IANAL, but I think something like that would look very, very bad to competition authorities. So the only way out, if your goal (hypothetically) is to avoid interoperability, is to sacrifice your existing Office customers who are using the Microsoft/CleverAge ODF Add-in. It serves them right for not sticking to the party line in the first place. This'll teach 'em good.

Of course, I am not that cynical. I was taught to never assume malice where incompetence would be the simpler explanation. But the degree of incompetence needed to explain SP2's poor ODF support boggles the mind and leads me to further uncharitable thoughts. So I must stop here.

As I mentioned before, this is a step backwards. But it is just one step on the journey. Let's look forward (and move forward). This is just code. Code can be fixed. We know exactly what is needed to have good interoperability of spreadsheet formulas. In fact most of the code already exists for this. The only thing we need now is to actually go do it and not get too far ahead, or lag too far behind from the other implementations. This is more a question of timing and coordination than hard technical problems.

[2009/05/07 -- For more on this topic, see my "A follow-up on Excel 2007 SP2's ODF Support"]

Labels:

Tuesday, March 24, 2009

Taking Control of Your Documents

How to free yourself from Microsoft Office dependency in three easy steps

The Objective

When you save a document in your word processor, your work is encoded in a particular file format. You often have a choice of formats that you can use, with names like DOC, DOCX, RTF, WPD or ODT. Your choice of format will influence whether others can easily read your document today, whether you yourself will be able to read your document ten years from now, and whether you will be able to migrate painlessly to another word processor or operating system if and when you choose to do so.

Although many users simply click “Save” and give no thought to which format is being used under the covers, this unthinking use of the word processor's default settings is a recipe for vendor lock-in. In fact, several vendors intentionally set their default format to be ones which will only work well with their own software, fostering dependency on that vendor's software and lessening the user's ability to take advantage of other options in the market. The more documents you save and accumulate in a vendor's proprietary format, the harder it will be for you to consider any other choices.

The objective of this paper is to show you, the user, how to extricate yourself from this cycle of dependency and take control of your documents. Specifically, we show how you can, in three easy steps, free yourself from a Microsoft Office dependency. In the end you may, of course, choose to remain on Microsoft Office. You may decide to migrate to an alternative word processor. That, in the end, is your choice. But by following the three steps outlined below, your freedom of action will be preserved, and your choice of word processor will be based on your priorities and your needs, and not forced on you by your current application vendor.

Step 1: Take control of the default format

The older versions of Microsoft Office, Office 97-Office 2003), by default save documents in a family of binary formats with the extensions DOC (Word), XLS (Excel) and PPT (PowerPoint). Although these formats are proprietary Microsoft formats, over the past decade 3rd party applications have developed the capability to read and write these formats.

However, starting in Office 2007 Microsoft suddenly switched the default format to something called Office Open XML (OOXML). This format is not widely supported outside of Office 2007. So if you save a document in the OOXML format you make it harder for anyone else to read your document unless they are also using Microsoft Office 2007. In almost all cases, the same document, if saved in the legacy DOC format will be more interoperable. Staying with the default choice, OOXML, only restricts your choices and make you more dependent on Microsoft Office. Of course, that is why Microsoft made OOXML the default format.

The first step to liberate yourself from Microsoft Office dependency is to change the default format in Microsoft Office 2007 away from OOXML and back to the early binary formats supported by Office 97-2003, which are widely supported by 3rd party applications. This is a neutral step that preserves the status quo. By making these changes you will still be able to read and edit any OOXML documents that are sent to you, but all new documents you create will be saved in the more widely supported DOC/XLS/PPT formats.

If you are using Microsoft Office 2003 or earlier, then you should skip this Step and move on to Step 2, since OOXML is not the default format in those earlier Office versions.

To change the defaults, you will need to load Word 2007, Excel 2007 and PowerPoint 2007 and follow the following steps.

Word 2007

  1. Click the Office Button (the unlabeled logo button in the upper left of the program).
  2. Click “Word Options” at the bottom of the dialog.
  3. Go to the “Save” section.
  4. For the “Save files in this format” setting, choose “Word 97-2003 Document(*.doc)”.
  5. Click OK.


Excel 2007

  1. Click the Office Button (the unlabeled logo button in the upper left of the program).
  2. Click “Excel Options” at the bottom of the dialog.
  3. Go to the “Save” section.
  4. For the “Save files in this format” setting choose “Excel 97-2003 Workbook (*.xls)”.
  5. Click OK.


PowerPoint 2007

  1. Click the Office Button (the unlabeled logo button in the upper left of the program).
  2. Click “PowerPoint Options” at the bottom of the dialog.
  3. Go to the “Save” section
  4. For the “Save files in this format” setting, choose “PowerPoint Presentation 97-2003”.
  5. Click OK.


Administrators should also note that these settings may be made directly in the Windows Registry, and automatically pushed out to a work group via a login script or group policy. The registry settings corresponding to the above changes are:

HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Word\Options
Add String DefaultFormat=Doc

HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Excel\Options
Add DWORD DefaultFormat=38 (Hexadecimal)


HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\PowerPoint\Options
Add DWORD DefaultFormat=0 (Hexadecimal)


Step 2: Enable OpenDocument Format Support

Now that you've made the first steps towards taking control of your documents by preventing the lock-in effects of the OOXML default, it is time to take further control. You'll now want to enable OpenDocument Format (ODF=ISO/IEC 26300) support in Microsoft Office, so you can save and exchange documents using the free and open International Standard while remaining in the familiar Microsoft Office interface.

ODF is an XML-based, open document format standard, designed to be platform- and application-neutral and support interoperable use across applications, eliminating vendor lock-in. ODF is supported by many applications, including office suites from Sun, IBM, Novell and Google, as well as open source projects like OpenOffice, KOffice and AbiWord. Additional applications supporting ODF are listed on Wikipedia.

Microsoft Office does not currently support ODF “out of the box”, but you can enable ODF support in Office by installing a “plugin”, sometimes called an “add-in”. A plugin will add additional options or menu items to the Microsoft Office UI, allowing you to open and save documents in ODF format. In some cases you can even set ODF as the default format for new documents.

There are three main choices for adding ODF support to Microsoft Office:

  1. Sun Microsystems has published an “ODF Plugin for Microsoft Office” which supports Office 2000, XP, 2003 and 2007 SP1.
  2. Microsoft has sponsored an open source project on SourceForge for an “ODF Add-in for Microsoft Office”, which supports Office 2007, and also Office 2003 and Office XP if the Microsoft Office Compatibility Pack is also installed
  3. Microsoft has announced that Office 2007 Service Pack 2 (SP2) will enable ODF support in Office 2007, but this code is not yet available.

Step 2 is to evaluate and adopt a plugin to add ODF support to Microsoft Office. Start using ODF now, saving your documents in the open standard document format. This allows you to remain in Office, for now, while building your familiarity and comfort level with ODF.

Step 3: Exercise your Right to Choose a Native ODF Editor

The plug-in approach is a transitional approach. It allows you to continue working in Microsoft Office while you enable ODF support side-by-side. But at some point you will want to consider your options. Maybe you find that converting back and forth to ODF format in MS Office is slow. Maybe you are using Office 2003 currently, but want to avoid paying for an Office 2007 upgrade when mainstream support for Office 2003 comes to an end on April 14th, 2009. At some point you will want to move to an application that supports ODF natively. You are free at this point and have a wide variety of choices.

The important thing is that you have taken control of your documents. You are no longer dependent on Microsoft Office and its file format. You have broken free of the vendor lock-in. You are free to choose an alternative word processor when you want to and if you want to. Until then, be comfortable in knowing that you are keeping your options open while remaining in control of your documents.





Creative Commons License

Taking Control of Your Documents by Rob Weir is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.

This paper is also available in ODF and PDF formats.

Labels:

Monday, March 23, 2009

Introducing Planet ODF

I have an early Document Freedom Day present for you. Planet ODF is a feed aggregator based on Sam Ruby's Planet Venus, which itself is a refactoring of Planet 2.0.

Planet ODF aggregates several blogs, news sources, discussion forums and other online services related to ODF. I've tried to be semi-intelligent so you don't get random stories about the Oregon Department of Foresty or non-ODF blog posts by me. I'll tune the feeds over time, but the hope is to make it 100% ODF relevant content.

If you have a blog, discussion forum or any other ODF-related content with an Atom or RSS feed and want it included, then please let me know. It doesn't need to be 100% ODF. You can discuss your cats 90% of the time and ODF 10% of the time and I can set up a filter to bring in the relevant content.

Also, I've set up an OpenDocument Format group on the social bookmarking site Diigo. (I abandoned del.icio.us when the Microsoft/Yahoo takeover rumors started.) Even if you don't have a blog or a web site with a feed, you can use Diigo to bookmark any articles you think are relavent to ODF. If you send those links to the OpenDocument Format group, then they will automatically be included in the Planet ODF feed.

Enjoy, and pass on the good news.

Labels:

Wednesday, March 04, 2009

From the Statute of Frauds to WYSIWYS: Document Format Implications

I'd like explore the topic of electronic documents, digital signatures, and what properties are required of them them to be considered accurate and reliable written records. Since this is as much a social question as it is a technical one, we'll start with some history.

"An Act for prevention of Frauds and Perjuryes" 29 Carol. II (1677), commonly called "The Statute of Frauds", begins:

For prevention of many fraudulent Practices which are commonly endeavoured to be upheld by Perjury and Subornation of Perjury Bee it enacted by the Kings most excellent Majestie by and with the advice and consent of the Lords Spirituall and Temporall and the Commons in this present Parlyament assembled and by the authoritie of the same That from and after the fower and twentyeth day of June which shall be in the yeare of our Lord one thousand six hundred seaventy and seaven All Leases Estates Interests of Freehold or Termes of yeares or any uncertaine Interest of in to or out of any Messuages Mannours Lands Tenements or Hereditaments made or created by Livery and Seisin onely or by Parole and not putt in Writeing and signed by the parties soe makeing or creating the same or their Agents thereunto lawfully authorized by Writeing, shall have the force and effect of Leases or Estates at Will onely and shall not either in Law or Equity be deemed or taken to have any other or greater force or effect, Any consideration for makeing any such Parole Leases or Estates or any former Law or Usage to the contrary notwithstanding.

Or, to loosely paraphrase in modern English: "We've noticed that verbal agreements are being abused. So in certain specific important agreements you better put it in writing and sign it, otherwise don't bother to bring any dispute to court."

A few things to note about the Statute and its context:
  1. As the preface notes, frauds were being perpetrated, involving oral contracts and perjury. Before this Statute, oral testimony, even without any evidence of a written agreement, could be used to deprive a person of real or personal property.
  2. The Statute is concerned with private agreements. Although it was already well-established practice by this time for official acts, writs, etc., to be recorded in written form and sealed, literacy, even among tradesmen, was not high, and private agreements were made only orally.
  3. The imposition of a stamp duty or tax to seal official documents, followed this Statute a few years later, ostensibly to raise funds to fight a war against France. But like all forms of taxation, they seem to outlive their original intent, and exist even to the present day, even though England apparently is now at peace with France.
This Statute spread to the American Colonies, where in modified form it lives on in various state laws, and in the Uniform Commercial Code (UCC) today, in §2-201:

A contract for the sale of goods for the price of $5,000 or more is not enforceable by way of action or defense unless there is some record sufficient to indicate that a contract for sale has been made between the parties and signed by the party against which enforcement is sought or by the party's authorized agent or broker.

I'd like to look a little at what it is about a written agreement that gives it its particular value. Why did they require it to be written? Why not just require witnesses to an oral agreement?

A few salient properties of a written agreement:

  1. A written agreement states the parties to the agreement, the terms of the agreement and is signed by the parties.
  2. Once signed, the agreement may not be altered but by mutual consent of the parties. In the judgement of Brett v. Ridgen, Plowd. Comm., 345, Lord Dyer wrote that "...men's deeds and wills, by which they settle their estates, are the laws which private men are allowed to make, and they are not to be altered, even by the King in his court of law or conscience. We must take it as we find it."
  3. The "mirror image" rule applies. Both parties must agree to the same terms. If part A makes an offer, and party B says they accept, but in fact adds or qualifies the terms of the offer, then this is properly treated as a counter-offer. The agreement is not made until both parties agree to the same terms.
  4. The underlying mechanics and notation of the agreement are flexible, unless otherwise specified. Whether scribbled with a crayon on a napkin, sent by telegram, teletype, fax or email, these may all be considered written agreements.
The affordances of paper and ink, which lends itself particularly well to the above concerns include:

  1. Paper/ink expresses symmetric information. What you see is what I see and is what will be seen in court if we end up there some day.
  2. There is no invisible ink, no hidden pages. The text of the agreement does not say something different under the florescent lights at the court house versus the sunlight at the construction site. Although these things in theory could be done, via special inks and papers, the use of these techniques in an agreement would be prima facie evidence of fraud.
  3. Certainly, if it is poorly written, the terms of the agreement could be ambiguous and subject to various interpretations. Paper/ink cannot make you or your lawyer smarter. It only makes the agreement an accurate and reliable record. If a particular word is smudged or a number is crudely written, I can see this flaw and you can see this flaw and either of us can require the flaw to be fixed before we sign the agreement. If there is text that is unclear in meaning, I can ask my lawyer to explain it. I am able to understand the document perfectly should I take care to do so.
  4. Paper/ink is accurate and reliable over the time scale of personal and commercial contracts.
  5. A person's signature or mark on an agreement, absent evidence of fraud or coercion, clearly indicates their assent to the terms of the agreement. We do not commonly write our signature unless we intend to express assent.

Jump ahead to the present day, with the increasing use of electronic documents and digital signatures. Digital signatures offer some of the same affordances we traditionally had with paper/ink. Provided the chain of certificates and keys have not been compromised, that the underlying applications have not been compromised and that the act of signing requires an affirmative and unambiguous action by the signer, a digital signature is evidence of:

  1. What was signed
  2. Who signed it
  3. the intention to sign, i.e., give validity to the agreement
However, there is a weakness in electronic agreements, even when digitally signed. The weakness is in what is signed. When you sign a electronic document, you are signing the stream of bits and bytes that comprise that document in a particular document format. The average person lacks the ability to directly inspect or understand the underlying representation of an electronic document. They can only see what a particular software application running on a particular operating system running on a particular computer shows when loading the document. Will that signed document appear the same on a different computer, to a different person using a different software application or a different operating system? That is the critical question. Unfortunately, the affordances of paper/ink for symmetrical information, lack of hidden information, invariability over time and venue changes, etc., are not necessarily guaranteed with electronic documents.

The digital signature guys call out an additional requirement needed for a digital signatures to give the same guarantees as paper/ink agreements. It goes by the acronym WYSIWYS, or "What You See is What You Sign".

So what is required for electronic documents to have the same affordances as paper/ink for use as accurate and reliable records? I suggest the following:
  1. The format used by the electronic document must be specified in an open standard.
  2. The format standard must define the characteristics of semantically equivalent documents and specify the format sufficiently so that implementations of the standard can display semantically equivalent renderings of the document. Semantic equivalence is not broken by minute differences in layout, so it should be possible to have semantically equivalent renderings on different devices, e.g., a laptop versus a smart phone versus a screen reader.
  3. The application used to view and sign the electronic document must conform to standard, specifically those stated parts of the standard necessary to render a semantically equivalent document.
  4. The document must be strictly conformant to the standard, with no extensions. Just as you would not physically sign a paper document that contained interpolated text in a language that you do not understand, you should not sign an electronic document that contains unknown extensions. Otherwise semantic equivalence is not guaranteed between the two parties and a "mirror image" problem.
  5. Semantic equivalence must not rely on graphics. Although graphical content is permissible, such content must be redundant with respect to the text. Otherwise the "mirror image" problem is unresolvable between sighted and blind persons.
Further, I believe these criteria are of more general applicability. Although the Statute of Frauds may have been intended for marriage contracts and the like, the need to have accurate, reliable written records is a ubiquitous requirement for business and public administrations today. Wherever misunderstanding would be liability, where it is particularly important for multiple parties to be "on the same page" with respect to the contents and meaning of a document, these considerations apply.

For editable formats like ODF, I think it points out the need to describe a formal content model that describes the semantic content of a document, aside from its formatting and layout. So text + lists + tables + headers + footers + footnotes + images + captions, etc. Visual appearance is nice to have as well, but it is less robust when rendered on different devices, different operating systems, and is less likely to be robust when rendered on OpenOffice 10.0 in 2015. But the equivalence of the semantic content of an unextended ODF document should provide the same ability to have an accurate and reliable record in an electronic document as we have had traditionally with paper and ink.

Labels:

Tuesday, March 03, 2009

Low-Fat ODF

Jack Sprat could eat no fat.
His wife could eat no lean.
And so betwixt them both, you see,
They licked the platter clean!

Is dietary fat good? Or is it bad? Without getting into a discussion of saturated versus unsaturated fats, or the virtues of omega-3 oils, let me make a few basic, reasonable observations:
  1. Individuals differ in their preferences and requirements for fat intake. There is no single answer for all people at all times.
  2. Experts differ in their recommendations for fat intake.
  3. Standards exist for how to measure and report the fat contents of food products.
  4. Standards also exist for the specific conditions under which a vendor may call their food products "low fat" or "light" or "fat -free". For example, "low fat" products must have 3g or less fat per serving.
  5. The government requires vendors of retail packaged food to label the fat content in accordance with standards #3 and make only claims regarding fat content that conform with standards #4.
The above system generally works. Food vendors have the freedom to add as much fat as they want to their products. If they want to sell deep -fried bacon-wrapped cheese, then fine. No problem. It is a free country. But this is balanced by the consumer's ability to know the fat content of the products that they purchase. This gives control to the consumer, allowing informed choice.

But take away the standards, take a way the reporting requirements, and the manufacturer has all of the control. Let's imagine a world where there were no such fat content standards. Medical research would still progress and the long-term dangers of high-fat diets would still be known. But the consumer's ability to control their fat content would vastly reduced. There would be no informed choice.

Imagine further that Company A, observing the medical research and consumer interest in healthy food, decides to offer a low-fat cheese. But if Company A sells their low-fat cheese, the label "low fat" itself would have no formal meaning. In this hypothetical, there are no standards. Nothing prevents Company B and Company C from also advertising their existing cheeses as "low fat". Without standards there is no differentiation. Since consumers have no effective way to test the fat content of cheese on their own, they are at the mercy of the non-verifiable claims of vendors and the advertising agencies. Because there are no acknowledged standards for fat content, the market for low-fat cheese is stunted. The consumer does not benefit and the innovative Company A does not benefit. No one wins.

This is a general concern for markets where the consumer cannot directly verify the quality of the goods, because they are packaged and inaccessible to inspection, or because the consumer lacks the technical ability to determine the quality themselves. From fat content to auto gas mileage efficiency, this leads to standards for measuring and reporting qualities of interest to consumers.

So back to reality. We do have fat content standards, for measurement and reporting. Suppose that Company A sells its low-fat cheese and it is very popular, because it is what the consumer wants. Company B is envious of the higher margins on low-fat products, but it would take too long for them to revamp their production line to make a cheese with 3g or less fat per serving. They can only get it down to 5g per serving. What can they do? Well, they can hire a lobbyist, go to Washington, DC, and spread some influence around. They could try to get the FDA to change their definition of "low-fat" so it includes their higher-fat products as well. If you can't change your product to meet the standards that consumers want, then dumb down the standards!

Sound far-fetched? This is actually happening all the time with certified organic food in the United States. Non-organic ingredients are routinely being allowed in organic food products based on requests from big food manufacturers. The consumer has very little visibility or voice in this process.

So what does this all have to do with ODF? Fair question. The analogy is to extensions of ODF, a topic currently being hotly debated on the OASIS ODF Technical Committee. Extensions are additions to an ODF document which are not defined by the ODF standard. They may be proprietary vendor extensions, or extensions using other open standards. But regardless, since their use in an ODF document is not defined by the ODF standard, they are difficult or impossible to use in an interoperable fashion, at least by those who do not know the secret details of the extension. However, such extended documents may be immensely useful in some situations.

So are extensions good? Are they bad? Are you more concerned with interoperability? Or with a particular use that requires the extension? There is no single answer for all people at all times. Because of this, it is important to put control firmly in the hand of the consumer of ODF products, so they can make the appropriate choice for themselves.

Similar to the mechanism of food labeling, putting control in the consumer's hands requires that we:

  1. Have a formal definition of what an extended ODF document is versus an unextended ODF document.
  2. Have something like a reporting requirement, so it is clear to the consumer whether a particular document is extended or not.
The proper pace to address these points is in the conformance clause of the ODF Standard. To that end, the current draft of ODF 1.2 defines two conformance classes, one for extended documents and one for unextended documents. The aim, in the end, is to give the consumer greater control and allow them to make a more intelligent choice. We can't force vendors to implement one or the other conformance class. And we can't force consumers to use one or the other. But we can formally define what an extended document is and let the free market operate based on the additional information made available.

This is a small step and I know it doesn't sound like much, but even this modest step provoked such a paroxysms on the ODF TC that you would have thought I was splashing holy water at an exorcism. I suspect this means that I must be doing something right!

Labels:

Sunday, March 01, 2009

ODF Spreadsheet Interoperability: Theory and Practice

This is a follow up to some work we did at the ODF Interoperability Workshop in Beijing last November. We had good participation there: IBM, Sun, Google, Novell and Redflag from the big vendor side, as well as a good number of users. It was a full-day workshop and we covered a number of topics. One of them was spreadsheet formulas. I gave a short presentation on spreadsheet interoperability, specifically on the work we've done on OpenFormula for ODF 1.2. We also did a short exercise to look for spreadsheet formula bugs.

As many of you know, neither ODF 1.0 nor ODF 1.1 defines a spreadsheet formula language. They leave it implementation-defined. The specification makes only a few broad statements, such as a recommendation that formula attributes be qualified by namespace, that formulas begin with '=' , that cell addresses be surrounded by '[' and ']' and that formula parameters be delimited by ';'. So in theory, this is a mess. But in practice it has worked out quite well, since implementations have played "follow the leader" and have nearly converged on interoperable spreadsheet formulas. With ODF 1.2, we'll standardize the consensus on spreadsheet formulas, giving even greater certainties.

Let's see how this works in practice. I created a simple spreadsheet document in several ODF-supporting applications, including Microsoft Office using the various plugins. Here is what I tested:

  1. Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
  2. Google Spreadsheets
  3. KOffice's KSpread 1.6.3
  4. Lotus Symphony 1.1
  5. OpenOffice 2.4
  6. Microsoft Office 2003 with Sun's ODF Plugin

I used what I had installed on my two machines, Windows and Ubuntu. There may be updates to some of these applications that do even better.

I created the same basic spreadsheet from scratch in each editor and saved it as ODF format. I then looked at each document to see how formulas were being stored in the XML:

  1. CleverAge stores it in the OpenOffice namespace (xmns:oooc="http://openoffice.org/2004/calc")
  2. Google also uses the OpenOffice namespace.
  3. KSpread doesn't use namespace-qualified formula attributes.
  4. Symphony also doesn't use namespace-qualified formula attributes.
  5. OpenOffice uses the OpenOffice namespace.
  6. Sun's Plugin also uses the OpenOffice namespace.
OK. So there is some variation in how the formulas are stored, with two approaches in use. How does this then impact interoperability? In theory it is horrible. In practice it works out pretty well.

I took each of the 6 spreadsheet documents and opened each one in each of the other 5 applications -- 30 interoperability tests -- to see whether the formulas were loaded and calculated correctly. Here is what I saw:



Created In






CleverAge
Google
KSpread
Symphony
OpenOffice
Sun Plugin

Read In


CleverAgeOK
OK
Fail
Fail
OK
OK

GoogleOK
OK
OK
OK
OK
OK

KSpreadOK
OK
OK
OK
OK
OK


SymphonyOK
OK
OK
OK
OK
OK

OpenOfficeOK
OK
OK
OK
OK
OK

Sun PluginOK
OK
OK
OK
OK
OK

So the formulas came through OK, in almost all instances. The only exception was the CleverAge add-in, which failed to process formulas from KSpread and Symphony. For example, loading the Symphony spreadsheet into Office 2003 results in cells with contents containing errors such as "=#REF!+#REF!-#REF!" which is tantamount to data loss.

I think we can do better than this with a few simple changes.

The Law of Robustness as stated in RFC 1122 is "Be liberal in what you accept, and conservative in what you send." Adapting that principle to ODF spreadsheets, I recommend the following practice for ensuring interoperability using ODF 1.0 and ODF 1.1:
  1. When writing ODF 1.0 or ODF 1.1 spreadsheet documents, write formula attribute values using the OpenOffice namespace prefix: "http://openoffice.org/2004/calc". All ODF spreadsheet applications I have tested accept and correctly process formulas in that namespace. Note that the CleverAge add-in is not doing the namespace checks in a XML-correct fashion. They are comparing only the text of the prefix, not resolving it to a namespace URI and comparing the URI's. So you should be sure to also use "oooc" as the namespace prefix.
  2. When reading ODF 1.0 or ODF 1.1 spreadsheet documents, be prepared to handle formulas with no namespace qualification as well as those with the OpenOffice namespace.
Specifically, Symphony and KSpread should consider making changes to accommodate #1 and CleverAge should consider changes needed to do #2. In the CleverAge case, a trivial, one-line change to OdfConditionalPostProcessor.cs will quickly restore compatibility with Symphony and KSpread documents.

Now, if you are entirely satisfied with what I have said above, and have no lingering doubts, then you are not thinking enough. It is not enough to merely bring the spreadsheet formulas over intact. Interoperability also requires that we interpret the formulas in the same way.

So let's look at that side of the equation (no pun intended). Fortunately, we are all quite close to what is being defined in ODF 1.2's OpenFormula specification. This is not so surprising, since OpenFormula was based on actual spreadsheet practice, looking at a variety of spreadsheet applications. I did a quick test of the 6 ODF spreadsheet applications to see how well they fared against a test suite of 509 core tests that OpenFormula defines for spreadsheet functions. The results were:
So, we're not yet perfect, but we're getting pretty close. Interestingly, the lowest scores (CleverAge) and highest scores (Sun Plugin) are both for the same calculation engine (Excel).

Looking forward, we'll continue to edit and refine OpenFormula and its test cases. You might look for it when it comes out for public review, hopefully in a couple of months. Unlike other parts of ODF 1.2, OpenFormula is essentially XML-free. It is a mini-expression language, defined by a BNF grammar and accompanied by hundreds of spreadsheet functions from mathematics, finance, engineering, statistics, etc. So review by subject matter experts in these disciplines is especially needed, even if they have zero XML experience. If you want to see the current OpenFormula Working Draft, currently in its 71st revision, take a look. Comments may be submitted to the ODF TC's comment list.

I'm also looking forward to testing Office 2007 SP2's ODF support when it comes out, to see how their ODF support is improving. Anything less than the 500/509 results that Excel 2003 gives with the Sun Plugin will be a disappointment. KOffice has a 2.0 version in beta I should look at. OpenOffice has their 3.0 update. Sun also has an updated ODF Plugin. I'll lean on the Symphony team as well, and see if we can beat 500/509. Game on!

Labels: ,

Wednesday, February 25, 2009

Whither ODF?

Whether ODF will wither or weather
depends on us as we work together.

The question is where we should go: whither?
The answer is clear at once.
The question of "whither" is not so dense,
and is easy to answer when we start with "whence?".

Of the topic today
I will no longer delay nor dither to say
whether we will whither or weather
but will now give you my 2-cents.


Rob's ODF-Next Rant



  1. The word processor and spreadsheet, as we have them today, are relics of the 1980's, designed when the web did not exist and collaboration occurred predominantly by exchanging paper documents. If we were designing a document author and collaboration system to meet modern circumstances and capabilities, it would likely bear little resemblance to Word. So the question is how much do we let the sunk costs of yesterday continue to determine our future? How much longer do we paint speed stripes on a horse and pretend that it is a racing car?
  2. Products like Word and Excel have evolved via the uncritical accretion of functionality over the past decades to a point where the products are overly complex resource gluttons with a knack for having a critical security flaw reported in them every other week.
  3. Increasingly users are getting work done via email, wikis and blogs rather than using heavy-weight document editing solutions. Why is this so? Why is the modern word processor losing users rather than gaining them?
  4. WYSIWYG is a fine paradigm if you are doing all of your work targeting printed output. But it is a sub-optimal approach for creating documents for almost any other use.
  5. The revered Bold, Italics and Underline icons, along with the font selection drop down list, which define the modern editor GUI, should be forcibly removed from the user interface, stripped of rank, and put on trial for crimes against productivity. You are writing a document, not decorating a cake. You need to ask yourself "Why should this text be italics?" Is it a book title, a foreign phrase, a name of a movie, the name of a legal case? Then choose a named style that indicates why that text is special. Let the named style take care of how it is displayed.
  6. Unless you are designing a poster for a modern art gallery you should stick to the named styles in your template. Power users might define additional named styles. But direct application of random attributes to random text selections should be considered a form of data corruption.
  7. Few documents today are ever printed. The are born, live and die entirely in digital form. We should be optimizing for the most common cases, not just for what our parents or grandparents did with WordPerfect 1.0.
  8. The most common sources of reused content come from other documents and from PDF and from HTML. Current cut & paste mechanisms today make a mess of styles. Paste in the content with the styles of the source document? According to the styles of the destination document? Mapping to the nearest local style? All are wrong answers. The only correct answer is to give me the choice.
  9. PowerPoint is pure evil. It has elevated form over substance and turned every form of business communication into a "pitch".
  10. I should be able to call spreadsheet functions using named parameters, like PV(rate=1%,periods=12,payment=$1000.00) rather than PV(0.01,12,10000) so my model is self-documenting and avoids errors from incorrect ordering of parameters.
  11. Security needs to be designed into the document authoring environment, including the format, not patched on as an afterthought.
  12. I want Greasemonkey for my word processor and my spreadsheet.
  13. Connections between documents may be as important as the documents themselves.
  14. The less control the user asserts over the appearance of a document during editing, the more flexibility he or she has over the final published appearance. In today's multi-modal, multi-device world, it is essential that we do not prematurely commit our documents to a particular rendering. We need late binding of presentation to content, not early binding. If we had done this for the past decade, we would have perfect interoperability today between all word processors. If we start doing it now, we will have perfect interoperability among word processors going forward.
  15. Spreadsheets should have functions that access web-based data stores for common financial, economic, political and scientific data sets. Mathematica does something similar, presumably using local caching.
  16. Presentation should be a mode of displaying another document, not just document type itself. For example, I should be able to take a report and push a button to enter a slide-show mode, where all images are shown as slides, with their captions, and each top level section header becomes a slide with 2nd level headers as bullet items. During the presentation I should be able to seemlessly drill down into the real document.
  17. I want to be able to share data ranges, text ranges and presentation slides with others and to subscribe to theirs via feeds. I rarely write a document from scratch. Reuse, reuse, reuse. But the tools only support this at a scavenger level.
  18. We lack high level support for the compositing or assembling a document from fragments. Once I cut & paste, my new docment has lost all knowledge of the document I copied from. This is great if I am a professional plagiarist. But it is bad if I am a CIA analyst and my report has copied information claiming uranium production in Africa, and I never know when that information is repudiated, and I pass my flawed report onto the President. Very bad. When I cite an authority for an argument, my argument is only as good as the authority. I owe it to myself and my readers to make it easy to know whether the information I cited is still accurate and vouched for by that authority.
  19. Current tools are impoverished when it comes to the social side of documents. Review/comment reflects old, hierarchical thinking and doesn't scale to the network. How can I have 100 people comment on my document? What if I want 100 people to jointly author a document? The Wiki knows where Word cannot go...
  20. Most user woes in modern word processor are caused by our attempts to remain compatible with the design choices made by Microsoft Office developers 15 years ago. It is time to move on and learn from past mistakes, but not perpetuate them.
  21. I want to use the same text editor to edit documents, web pages, emails, blog posts, discussion forums and wikis. Why do I need a different brand hammer for every nail?
  22. I want a spreadsheet function that can call a web service. It might lookup a book title by ISBN, do currency conversions, or geocode data. There should be thousands of such spreadsheet functions, backed by web services, interoperable based on standard protocols. Some might be free, others fee-based. Some might be both, e.g., 20-minute delayed quotes for free, real-time for a fee.
  23. Spreadsheet functions express a core analystic function and should be usable in all tables, in word processors and presentations, not just in spreadsheets. They should also be usable in fields in forms and in text passages.
  24. The inability of word processors to output clean, readable and valid HTML or XHTML should be an embarrassment to all vendors.
  25. HTML + JS + XHR + HTML DOM = AJAX. ODF + JS + XHR + ODF DOM = ?
  26. We must define power as in "power user" based on results, on productivity. Power is as much about what a system allows you to ignore as what it allows you to control.
  27. Today trust is based on digital signatures and classical questions of authentication, integrity and non-repudiation, all backed by a chain of trust traceable back to some well-known certification authority. In some contexts, this hierarchical, binary view of trust is adequate. But the network sees trust based on reputation, rating, scoring, voting, reverse citation counts and other non-hiearachical values. How do we account for these?
  28. Spreadsheets are unnecessarily dangerous, based on a muddled view of data types which leads to silent errors and inconsistencies. This might have made sense in the memory and processor constrained systems of the 1980's. But today, with our better sense of the errors and the cost of errors, we need a spreadsheet system that is type-safe, aware of measurement units, and which enforces consistency and accuracy. We obviously can't prevent someone from making a stupid spreadsheet model for subprime mortgages, but we can at least ensure that they don't make stupid cut & paste errors when creating that model.
  29. Spreadsheets should have instrinsic support for image, sound and geographic data. Not just embedded media, but as an intrinsic data type, so a function could take an image as input, or return an audio clip as a result.
  30. A grid in a spreadsheet provides a logical addressing scheme as well as a visual layout scheme. But what if I want the former without the latter? Why can't I do a spreadsheet calculation in a text document? Why am I always stuck in in a grid?
  31. Spreadsheets should have built-in support for sensitivity and risk analysis, perhaps via monte carlo methods. Yes, I know support is available via 3rd party plugins, but this should be a core feature in the repetoire of every user. We might not be in the global financial mess we're in now if spreadsheet users all could easily "stress test" their models.
  32. The Holy Trinity of Word/Excel and Powerpoint is only a convention, mainly enforced by Microsoft's definition of their office suite. It is not a law of nature. Other applications types should be considered to be part of the core desktop authoring environment, such as project management and mind maps.
  33. Outliners and other pre-draft tools have lagged far behind the core editing functions of a word processor. And what is the equivalent of an outliner for a spreadsheet?
  34. Microsoft is as much a prisoner to the predominent model of end user producitivty as the user is. Their need to support legacy documents constraints their freedom of action and has contributed to the relative lack of innovation in Microsoft Office over the past decade.
  35. An editor should allow a user to verify interoperability as easily as it lets them do a print preview.

Labels:

Sunday, February 22, 2009

Looking for Good Ideas for ODF-Next

A typical team project, whether software, standards, bridge construction or what have you, has a slow start dominated by a planning and scheduling, a middle period of execution, and an finish with final frantic rush of activity to complete the project. Then everyone takes a few days off and we start again.

One thing I learned early in my career was how wasteful this kind of project cycle is. The problem is that not everyone is involved in every part of the project. Some only work on planning, some only on execution, and some mainly come in at the end. This leads to suboptimal allocation of resources. People are standing around waiting.

One solution, not necessarily the only one, is to work on multiple versions of a project at once. For example, when working on a software application, you can take 25% of your team and have them start the planning phase of version N+1 while the remaining 75% of the team completes the final QA stage of version N.

We have a similar issue with standards development. Both the OASIS and the JTC1 PAS process involve a lot of standing around waiting: at least two months of public review in OASIS, and 6 months of review in JTC1. And even now, as we complete the editing work on ODF 1.2, the wider ODF community is standing around waiting. It is too late to make feature proposals for ODF 1.2, but too early for a full public review of the ODF 1.2 draft.

What is to be done?

The ODF TC has decided to begin activities on the next version of ODF, called for now "ODF-Next", even before we have ODF 1.2 approved. Although we obviously won't be spending a large amount of time on that effort quite yet, since we really are all busy with ODF 1.2, we have come up with a way to engage the broader community and have you help us gather requirements for ODF-Next now, which we can then consider during the downtime when ODF 1.2 is under review in OASIS and JTC1. The Call for Proposals for ODF-Next went out on Friday.

So put on your thinking cap. ODF 1.1 and ODF 1.2 were incremental releases. Maybe ODF-Next will be bolder, maybe something that shifts the paradigm, pushes the envelope, breaks out of the box. Is the dominant WYSIWYG word processing paradigm the final word in user productivity? Or are we overdue for a change, for a different set of priorities? As Thomas Paine wrote, "We have it in our power to begin the world over again."

Now is the time to start collecting the ideas, big or small, and submit them to the ODF TC according to the instructions in the Call for Proposals linked to above.

We'll be collecting ideas at least until March 31st. The Requirements Subcommittee will then sort through the ideas, categorize and prioritize them, and generally try to make sense of it all, and then write up an ODF-Next Requirements document with their recommendations.

This is a good chance to get your ideas in early and have a real impact on where we go with ODF in the next major release. But please, do not give me ideas via blog comments. We can only accept ideas sent through the above linked OASIS comment submission procedure, which is necessary to ensure that ODF remains an open standard that anyone can implement. IANAL, but I believe an added benefit is that any idea you submit, even if speculative, even if not added to ODF-Next, will be permanently archived in the ODF comment list, and thus will establish prior art which could scuttle attempts to secure patents in this area. So by contributing your ideas publicly in this way, you help to establish an intellectual commons that will benefit free and open source applications in this area.

Please pass along the word. We're hoping to get 100's of ideas for ODF-Next. Bring it on!

Labels:

Saturday, February 21, 2009

Strange corners of the Web

Back in the 1980's, when I was a student, I was also an avid shortwave listener (SWL). This was in the days before the web, satellite TV or 24-hour international cable news coverage. I had an upper floor room in Cabot Hall, and each night I would surreptitiously dangle out the window a 40-foot wire antenna attached to a small weight.

At first I listened only to the big broadcasters like the BBC Word Service, Deutsche Welle, Radio Moscow, and then moved on to smaller ones: Tirana, Malta, South Africa, etc. It was a great way to get a global perspective beyond the 2-minutes allocated to international news on a typical US-based evening news program.

Eventually I started writing the broadcasters and received many QSL cards. Some of my letters were read on the air. I'm sure I ended up on some FBI watch list for those letters to Radio Prague and Radio Havana. My subscription to Soviet Life magazine, and a Cambridge address probably didn't help either.

But you don't go far as a SWL before you notice that there are a lot of strange things going on in the aether. Some were easily explained -- the Soviet Union jamming broadcasts of Voice of America or Cuba jamming broadcasts of Radio Martí. And then there were the commercial voice broadcasts, ship-to-shore, international aviation, time signals, etc. Then the various data services, radio teletype, weather fax, etc. And then there were the mysterious coded transmissions, which we rumored to be SAC tranmissions, "Sky King, Sky King, Do not answer", followed by various authentication codes, which were either recall or go ahead codes for nuclear attack. It was an eerie feeling, in the hotter days of the Cold War, to lay awake at night, listening to the radio and wondering whether the sun would rise in the morning. Now I just wonder if my 401(k) will still be there.

Stranger yet were the cryptic transmissions of the "numbers stations", which would transmit on a semi-regular schedule and merely read off a large list of numbers for 10 minutes. For months I transcribed one particular woman's transmissions, trying to find out the pattern. I did some computer analysis, but the numbers were random in frequency, with no discernible patterns. Presumably they were encoded against a one-time pad.

And then there were the "pirate" radio stations like "The Voice of the Purple Pumpkin".

Although most people knew about the BBC World Service, I don't think many appreciated that a large portion of the shortwave universe was strange, that the fringe was everywhere.

I'm starting to have a similar view of the web. Their are major content providers, minor content providers, even individual content providers like me. And then their is the weirdness, the strange corners of the web, the space between the channels, where you are not even sure you are listening to signal or noise.

Here are a few random examples of web sites with no discernible purpose. They appear to be garbled republications of new stories.

Let's start with the "Wet Paint Body Notes" blog, newly created, with only three posts. One is called "Microsoft Gets Foot in Mass. Office Door". It starts:

In what could be a coup inwardly favour of Microsoft (Nasdaq: MSFT) and a biff to the friendly wellspring league, the stipulate of Massachusetts personal added Microsoft's Office Open XML norm to its document of give your declaration standards it will allow for elected representatives exploit.

This is a strange kind of English. It almost seems like a poor translation, or even a poor machine translation, of a document written in another language. But if you poke around a little, you find the this blog post is an unattributed garbled derivation of a 2007 article in Linux Insider. Not only was the original article in English, the reposted version truncates the article, posting only the first few paragraphs.

So what's up with that? There are no banner ads or other obvious sources of revenue on the garbled version of the article. It is not a link farm. In fact it has no outgoing links. So why did someone bother?

Another example. The blog "75Software-News48" has an new article "Microsoft shows support for ODF", posted just two weeks ago, with the intro:
Amid organization hassle surrounded by wish of interoperability, Microsoft (Nasdaq: MSFT) protected Thursday announced the discovery of the Open XML Translator Project. The overhang will fry in the air permitted software to allow Word, Excel and PowerPoint to knob documents in contrary technology format.

Again, this reads like it is a poor translation from another language. But look further and you can find that the original article is actually in English, from a 2006 TechNewsWorld article.

Again, no obvious intent here. It isn't a link farm, and there is no evident source of revenue. It isn't informative and it certainly isn't timely. So why did they do it?

One more example this time a LiveJournal blog called "All Microsoft", again newly created, with a post called "Ecma Approves MS Office Format, IBM Dissents". It opens:

Microsoft's (Nasdaq: MSFT) Open XML bureau software format, broad of via the tech giant to chase near the Open Document Format (ODF), cleared a standards hurdle this week, successful approbation from the Ecma global standards article.

Same modus operandi here. Original source, unattributed, is from a 2006 Linux Insider article.

I have dozens of examples of this kind of thing, all within the last couple of months, mainly articles about Microsoft and ODF. Something new is afoot. But what? Anyone have any idea of what this is and who benefits from it? If this just a contest between Blogger and LiveJournal to see who can claim the most hosted blogs? Or is it some SEO ploy? It has me stumped.

Labels: , ,

Tuesday, February 17, 2009

ODF 1.2 Committee Draft 01

It is not the end of the end, nor the end of the beginning, but more like the beginning of the end for the development of ODF 1.2. The Committee Draft 01 of ODF 1.2, Part 1 was approved by the OASIS ODF TC yesterday in a 9-2-2 vote. You can download it here.

A Committee Draft (CD) is the first step toward finalizing ODF 1.2. The TC will likely approve further CD iterations before voting to approve one as a Public Review Draft. The Public Review Draft, as the name suggests, will be what we send out for a public review of at least 60 days. We can then make changes based on review comments and hold additional public reviews if we make non-trivial changes to the Public Review Draft. The ODF TC can then vote to approve the draft as a Committee Specification. We then hold a further vote to send the Committee Specification out for an OASIS-wide ballot (not just the ODF TC, but all OASIS members) on whether to approve ODF 1.2 an OASIS Standard. Once that is done, we can then start the PAS approval cycle in JTC1.

Although there are a lot of votes and process steps remaining, the major technical work is just about done. What remains is a period of review, perfecting the text, gaining implementation experience and feedback, etc. Some may call this a "death march", but I see this pace as consonant with the importance of our activity and our deliverables. Work in OASIS might not be as fast as Ecma, where you can evidently create a 6,000 page standard in less than a year. Our process calls for a bit more than the IETF's "rough consensus and running code." But neither are we the slowest process in the standards development landscape. We're some place in the middle. And when we're talking about revising an open document format, already adopted and used by governments around the world, I am not ashamed to say that we're working deliberately and carefully.

We also need to socialize and grow consensus around ODF 1.2, both from implementers, but also adopters and consumers of ODF. There is still work to be done here. For example, the TC vote on the Committee Draft 01 was not unanimous. We did not have the support of Microsoft or Novell. There are still disagreements over how we define conformance in the standard. We obviously need to continue discussing this topic. Since the final TC vote to request an OASIS Standard ballot requires 2/3 approval of TC members with no more than 25% disapproving, we'll need a high level of consensus in the TC to move forward, including, hopefully, the support of Microsoft and Novell.

Implementation experience is important in OASIS. I know some have criticized OpenOffice for having support of draft ODF 1.2. But this support is a good thing, in my opinion. We need implementers to validate the design decisions we've made in the standard, to ensure that our choices are reasonable, that we haven't missed anything. We're working in an engineering discipline. We're not making abstract standards for the mind alone. Engineers build, test and refine. It is what we do. In fact, OASIS requires that before a Committee Specification can be nominated for an OASIS Standard ballot, the TC must certify that there are three conforming implementations of the Committee Specification. So not only are early implementations a good idea, they are required as part of the process.

If you are asking, "How can I help?", then here are a few ideas:
  1. If you are an implementor of ODF 1.0 or ODF 1.1, then now is a good time to start looking at what is required to add ODF 1.2 support. Download the CD of ODF 1.2, but also look at this page for a summary of changes. We'll formalize that list of changes and put it into a appendix of the draft, but this wiki page should give you a good feel for what areas have been touched.
  2. Although we have not yet approved a Public Review Draft specifically for public review, we welcome comments at any time. You can send comments on ODF 1.2 CD 01 according to the instructions on this page. Download the draft, pick a chapter of interest and send us any errors you find.
  3. We should start thinking ahead to how we can encourage a thorough review of the eventual Public Review Draft. I want to avoid the OOXML-fiasco where Ecma approved and sent to JTC1 a half-baked, deeply-flawed text. What can we do to give ODF 1.2 a really hard scrub in the OASIS review period, so what comes out meets the high standards we should expect from an international standard? I think we've done a good job in drafting ODF 1.2 and I want to encourage scrutiny, not shy from it. But let's have this scrutiny earlier rather than later.

Labels:

Friday, February 06, 2009

The 21st ODF Toolkit Scenario

Back in 2006 I gave a short in talk at a KDE conference in Dublin on the topic of "A Standard ODF Object Model", essentially laying out my thoughts on why we needed an "ODF Toolkit". As part of that presentation I listed "20 Prototypical App Dev Scenarios", my attempt to enumerate all the fundamental patterns of use for ODF. I did a blog post on this list later that year.

I'd like to augment that list with a new pattern of use, a clever idea suggested to me by Jomar Silva in an email quite a while ago, but an idea which I just recently warmed up to. I believe this technique could be quite powerful and should take its place as the 21st scenario for any ODF Toolkit.

It goes something like this:

If you have a toolkit written in a language, say Java, and the toolkit has API's which you can use to both read and write ODF documents, then you can write a program that will read an ODF document and write out the Java code that would be needed to re-create that same ODF document. So it is a code generation pattern. Java code reads ODF and writes source code for Java program that can then be compiled to write ODF.

This is very useful in a number of situations. For example, you can design your document in a familiar tool, like your word processor. Get all of the styles and layout correct and then run the code generator to generate the Java source file. Then hand-edit the source code to make changes, such as substitutions, insertions, looping to copy content down a row, etc. You could even adopt a place-holder convention in your original document, to make it easier to find the areas that you wanted to replace. For example "REPLACE-FNAME" and "REPLACE-LNAME" might be be a good place-holder.

Of course, this idea is of general applicability, not just limited to ODF. It could be applied, and for all I know has been applied to HTML, etc.

Labels:

Thursday, February 05, 2009

I love the smell of ODF in the morning

I have a short ODF trio to share with you today.

First up, Jomar Silva brings us the happy news that Venezuela now mandates the use of ODF, joining Uraguay, Brazil and 14 other national governments that have adopted the International Standard for office documents.

BrowserShots.org has been part of my web design toolkit for some time now. It allows me to easily test a web page to see how it renders on a wide range of browsers and platforms, without having to personally maintain a dozen different machine and configurations on my desk. You enter a URL and click off which of 50+ different browser versions you want your page rendered on. The system then queues up your requests, farms them out to various machines that render the pages and return screen shot images (PNG format) of the results. You get some results almost immediately, while others might take 30 minutes.

I've recently received news that this same concept is now being applied to ODF documents in a new project called OfficeShots. Funded by the Dutch government and the OpenDoc Society, this project (not quite yet ready for beta) will:

[H]elp you make a better choice by letting you compare the output and other behavior of a wide variety of applications. Does your corporate style - the technical basis for many documents - actually look consistent across the board of applications - from OpenOffice.org 3.0, Adobe Buzzword and Symphony 1.2 to Microsoft Office 2000 with the ODF addin from Microsoft - or the one from Sun Microsystems? And how does it look on Mac OS X in iWork? When you are in an acquisition phase, officeshots.org will help you do a reality check if that fancy new open source suite or that productivity package you can get a bargain deal at - actually does what it says. On the spot.

This is a great idea and I look forward to seeing it in operation.

Finally, if you also have some ODF project ideas, then be sure to note that the NLnet Foundation has named ODF as one of its two focus areas for 2009 and that they are accepting project proposals for funding. So get out that digital pencil and start writing down ideas.

Labels:

Monday, January 26, 2009

The State of ODF in OASIS

The year 2008 was a great year for ODF. The ODF Alliance has published their 2008 Annual Report [PDF] which is well-worth reading, especially for its coverage of ODF adoption. The date has long passed since I last could keep up with all the news stories related to ODF, so it is good to read over the report and see some of the accomplishments which I failed to note at the time.

It was a good year in OASIS as well, for ODF. The ODF TC, which I co-chair, created a new Subcommittee to investigate ODF-Next requirements, and we created a new OASIS TC, to join with the existing ODF TC and ODF Adoption TC, to work on "Interoperability and Conformance". We also saw a substantial increased in participation in the ODF activities, spurred by the increased demand for ODF and the increased maturity of ODF implementations.

A few statistics you might find interesting on the level of participation in OASIS TC's related to ODF, based on a tally I did this morning:
So 2008 was a good year, with robust participation from a wide range of stake holders in the development, maintenance and promotion of ODF in OASIS. I'm hoping for even greater participation and accomplishment in 2009, in spite of less-than-rosy economic conditions.

Labels: ,

Monday, January 12, 2009

ODF 1.0 Errata 01

Back in March 2007 JTC1/SC34 issued the following statement:

Liaison Statement from JTC1/SC34 to OASIS ODF TC

Defects have been identified in ISO/IEC 26300 and defect reports will be submitted to the OASIS ODF TC.

SC 34 requests that the OASIS ODF TC respond to these defect reports in a timely fashion and publish errata in accordance with OASIS procedures.

SC 34 requests that the Project Editor of ISO/IEC 26300 submit draft technical corrigenda consistent with OASIS approved errata conforming to ISO requirements for SC 34 ballot.

However, a defect report was not submitted by SC34 until seven months later, when a formal defect report (N0942) was eventually submitted.

I'm pleased to report that the OASIS ODF TC has created and approved a response to this defect report. The official announcement is here.

You won't find any substantive changes to the standard. The document mainly addresses trivial editorial errors. No implementation will need to change because of these errata. Some might argue that it is a complete and utter waste of time to make editorial changes to a standard when they can have no effect on implementations. And this is true, up to a point. But there is always the possibility that a minor grammatical or spelling error might, when the ODF standard is translated into another language, be transformed into a more substantive error. So, perfecting the text of a standard, even 4 years after publication, does serve a minor purpose and deserves proportionate attention.

Since several members of JTC1/SC34 have expressed a strong desire of keeping the OASIS and ISO/IEC versions of the ODF in sync, I'm sure they will be eager to turn this errata document into technical corrigenda for approval by SC34, now that OASIS has done what was asked of it, i.e., "published errata in accordance with OASIS procedures." The ball is in their court now.

Labels:

Friday, October 31, 2008

ODF Update

Nothing of interplanetary significance to report, I'm glad to report steady progress on all fronts.

As many of you already know, standards maintenance consists of two main activities:
  1. Defect removal through the issuance of corrections to published standards (variously called "errata" or "corrigenda", depending on your zodiacal sign)
  2. Revision, through the issuance of updated (and presumably improved) versions of the standard.
The OASIS ODF TC has been active in both maintenance activities, with some notable milestones in the past week or so on both fronts.

On the maintenance side, Wednesday 29 October saw the start of a 15-day public review for draft 3 of the ODF 1.0 Errata document. The official OASIS announcement has more information on the public review, including links to the errata document itself, as well as how the public may submit comments. JTC1/SC34, though their Secretariat, has also been invited to participate in this review.

Once the public review has concluded, and assuming that no new issues surface in the review, the ODF TC may approved and publish it as "OASIS Approved Errata" as well as transmit the text to JTC1/SC34 for application to ISO/IEC 26300.

On the revision front, the TC continues to work to complete ODF 1.2. But while finishing that revision, we decided that we also want to initiate a new activity related to the next version of ODF, the one after ODF 1.2. We did not have immediate agreement on what that version would be called (ODF 1.3? ODF 2.0?) so we started calling it "ODF-Next". We voted to create a new Subcommittee of the ODF TC, called the ODF-Next Subcommittee to start preliminary background work on this next version, in parallel with the TC's foreground task of completing ODF 1.2. The charter of the new subcommittee reads:

Statement of purpose
--------------------
As the ODF TC completes its work on ODF 1.2, it is desirable to instantiate a parallel effort to gather requirements and define a vision for the next major revision of the standard.

It is the purpose of the ODF-Next Requirements Subcommittee to gather requirements, to categorize these requirements by theme, to prioritize these requirements, and to submit a report to the ODF TC on a recommended set of work items for the next major version of ODF, which will have the working name of "ODF-Next".

Scope of work
-------------
In accordance with the above Purpose, the ODF-Next Requirements SC would undertake the following activities:

To collect requirements for ODF-Next from TC members, from the OASIS ODF Adoption TC, from implementors, from users, from the public, and from other stakeholders;

To ensure that all requirements collected have been formally submitted as contributions to the ODF TC, either as TC member contributions or via the Feedback License;

To categorize these comments according theme;

To prioritize the themes and the requirements within the themes;

To produce and submit to the ODF TC a report on a recommended set of work items for ODF-Next

Bob Jolliffe, from the Department of Science and Technology, South Africa, has agreed to chair the Subcommittee. We had our first meeting last Tuesday.

I think this is going to be exciting. ODF 1.0 and ODF 1.1 was about mainly about encoding, in an open standard, the output of conventional productivity applications. If you are a conventional person, running a convention business, with conventional ideas looking for a conventional profit, then great, don't let me wake you up. But I think we need to do more than that. Achieving mere conventional doesn't get me out of bed in the morning. If I wanted to just replicate what others were doing, I'd join the Mono project.

ODF 1.2 starts to break away from that conventional view with its richer view of metadata. But with ODF-Next, we can pull significantly ahead and move into uncharted territory. As Thomas Paine wrote, "We have it in our power to begin the world over again."

As you can tell, from reading the charter, our primary initial task will be to collect feedback for feature ideas for the next release of ODF. When we formally put out the call for comments, I expect a huge response. So our initial TC meeting was mainly spent discussing ways in which we can can handle a large volume of public comments, in terms of collection, categorizing and prioritizing. Once we agree on a tool to use, and set up some infrastructure to handle the load, expect to hear more on this blog, and elsewhere, about how you can submit your ideas, and help define the capabilities of the next version of ODF.

Next, I'd like to note that the OASIS ODF Interoperability and Conformance TC (OIC TC) met for the first time last week (and a second time again this week). We elected Bart Hanssens of Belgium as Chair of the technical committee. Bart works for Fedict, the Belgian federal ICT agency, one of the early adopters of ODF. Companies represented on the TC include IBM, Sun, Novell, Google, Oracle, Red Hat, Sursen, Ars Aperta, and the US Department of Defense. We also have a number of individual members.

The greatest difficulty in our initial call was determining a schedule for future meetings. With participants spread out from California to Boston, Paris, Hamburg and Beijing, there is no time which is going to be easy for all of us. The best we could come up with was to meet at 1430UTC, corresponding to 0930 EST, 1530 CET, 2230 China, but 0630 PST (ouch).

In any case, the OIC TC discussions are flowing well, as we start to discuss how we engineer test cases, what data to collect for them, how to encode test metadata, etc. You can follow the discussion in the public archives of the TC's mailing list, or even better, consider joining OASIS ($300 for an individual membership) and participate in this or any other OASIS Technical Committee.

Finally, the ODF Adoption TC has been busily preparing to host a panel discussion and workshop related to ODF interoperability at the OpenOffice.org Conference in Beijing next week. In fact, I should now stop procrastinating and get back to completing by presentations!

If you add it all up: the three ODF-related TC's (ODF TC, ODF Adoption TC, ODF Interop and Compliance TC), we have a combined 79 members, of which 68 represent 25 different OASIS corporate or organizational entities, and the remaining 11 are individual members.

-Rob

Labels:

Sunday, October 12, 2008

ODF @ OOoCon 2008

Ah,the relief. I can miss the silly season this year. I can turn off the TV, turn off the talk radio, turn the newspaper straight to the sports page, and altogether ignore the last month of the campaign.

Why? Because I'm attending the OpenOffice.org 2008 Conference in Beijing, November 5th-7th. Since I'll miss election day, I'm submitting an absentee ballot, and in fact I've just filled it out. I predict a great increase in personal productivity from being able to sit out the remainder of the minute-by-minute saturation campaign coverage.

This will be my third OOoCon. After Barcelona last year and Lyon in 2006, the organizers this year have a tough act to follow. But from what I can see, this year is shaping up to be the "best ever", with open ceremonies at the Diaoyutai State Guesthouse (former residence of Madame Mao) and a conference sessions at Peking University.

Although the focus of the conference is OpenOffice.org, the program, the developers, the translators, promoters and users, there is also a natural overlapping interest in OpenDocument Format (ODF). Because of this, OOoCon typically is also the largest ODF conference of the year, at least based on number of ODF-related sessions.

In particular I'll draw your attention to the following ODF-related sessions:
Full details are in the conference program. My pride in seeing so many good ODF-related sessions is slightly offset by the the sadness that interest in ODF has grown so much that I can not possibly attend all of these sessions.

I hope to see many old and new friends in Beijing. This is a great opportunity to continue spreading the message of open source and open standards around the globe.

Labels:

Thursday, September 25, 2008

Introducing the ODF Interoperability and Conformance TC

A short tale, all true, to relate. The names have been changed to protect the guilty.

Years ago, but not so very long ago, when XML still had that new car smell, two companies, let's call them Red and Blue, decided to make a new XML-based standard. This new standard would be, they claimed, a huge step forward and would increase interoperability, especially in complex heterogeneous environments, with multiple operating systems, multiple vendors and applications, etc. Their activities received much fanfare in the press. Everyone was pleased that Red and Blue were cooperating together to make this new standard.

This wonderful new standard was eventually completed, and Red and Blue both went and implemented the standard in two implementations which I'll call RedLib and BlueLib. But when they tried running their RedLib and BlueLib implementations against each other, to demonstrate interoperability, it didn't work. It was a total failure. There was zero interoperability.

So what did Red and Blue do? They realized that interoperability is not guaranteed merely by the existence of a standard. You also need high quality implementations, implementations that accurately and completely implement the standard. For any non-trivial standard, implementation errors will dominate the list of causes of interoperability problems. So Red and Blue worked together, with other vendors, to create an interoperability lab for the new standard, and created test suites to test interoperability, and held interoperability demonstrations at conferences, and tested and iterated on this until the implementations provided a high level of interoperability.

Today billions of dollars are transacted every day using this XML-based standard.

With ODF we find ourselves in a similar, though more complex, situation. There are more vendors involved than just Red and Blue. We are starting with many commercial and open source implementations. In some cases, with some editors, interoperability is quite good. In other cases it is rather poor. But when a user loads a document, which they may have downloaded on the web, or received via email, they have no idea where that document came from, what application, what operating system. And when you create an ODF document, you may not know who will eventually read it. It isn't enough to have good interoperability between some ODF implementations. We need good interoperability among all ODF implementations.

From a technical perspective, this is a goal we all know how to achieve. It has been done over and over again throughout the history of technology standards, especially network standards. You develop test suites, you test your implementations against these test suites, you have interoperability workshops (or plug-fests as they are sometimes called). You iterate until you have a high level of interoperability.

For the past 6 months I've been talking to my peers at a number of ODF vendor companies, to fellow standards professionals in OASIS, to ODF adopters, as well as to people who have gone through interoperability efforts like this before. I've given a few presentations on ODF interoperability conferences and led a workshop on the topic. I led a 90-day mailing list discussion on the ODF interoperability. Generally, I've been trying to find the best place and set of activities needed to bring the interested parties together and achieve the high level of interoperability we all want to see with ODF.

The culmination of these efforts is the creation of a new Technical Committee in OASIS, called the ODF Interoperability and Conformance TC, or OIC TC for short. The official 30-day OASIS Call for Participation went out last Friday. You can read the full charter there, but you can get a good idea by just reading the "Scope of Work":

  1. Initially and periodically thereafter, to review the current state of conformance and interoperability among a number of ODF implementations; To produce reports on overall trends in conformance and interoperability that note areas of accomplishment as well as areas needing improvement, and to recommend prioritized activities for advancing the state of conformance and interoperability among ODF implementations in general without identifying or commenting on particular implementations;
  2. To collect the provisions of the ODF standard, and of standards normatively referenced by the ODF standard, and to produce a comprehensive conformity assessment methodology specification which enumerates all collected provisions, as well as specific actions recommended to test each provision, including definition of preconditions, expected results, scoring and reporting;
  3. To select a corpus of ODF interoperability test documents, such documents to be created by the OIC TC, or received as member or public contributions; To publish the ODF interoperability test corpus and promote its use in interoperability workshops and similar events;
  4. To define profiles of ODF which will increase interoperability among implementations in the same vertical domain, for example, ODF/A for archiving;
  5. To define profiles of ODF which will increase interoperability among implementations in the same horizontal domain, for example ODF Mobile for pervasive devices, or ODF Web for browser-based editors.
  6. To provide feedback, where necessary, to the OASIS Open Document Format for Office Applications (OpenDocument) TC on changes to ODF that might improve interoperability;
  7. To coordinate, in conjunction with the ODF Adoption TC, Interop Workshops and OASIS InterOp Demonstrations related to ODF;
  8. To liaise on conformance and interoperability topics with other TC's and bodies whose work is leveraged in present or future ODF specifications, and with committees dealing with conformance and interoperability in general.
We have a broad set of co-proposers of this new TC, representing ODF vendors, ODF adopters, private sector and government:


The OIC TC will have its first meeting, via teleconference, on October 22nd. At that point members will elect their chairman.

I'd like to see broader representation in this TC's important work. In particular, I'd like to see:
  1. Additional vendors that support ODF, such as Corel and Microsoft (and yes, before you ask, I have already extended a direct and person invitation to Doug Mahugh at Microsoft)
  2. A representative from KOffice
  3. A representative from the OpenDocument Fellowship, which has already done some work on an ODF test suite. Wouldn't it be good to combine our efforts?
  4. Representatives from non-desktop ODF implementations, e.g., web-based and device-based.
  5. Broader geographic participation.
  6. Participation with specialized skills to help define and review test cases in areas such as: Accessibility, East Asian languages, Bidi text, etc.
  7. People with an interest in archiving, to help to define an ODF/A profile.
So, if you fall into one of those categories, I hope you'll consider joining the new TC. Heck, even if you are outside of those categories you are welcome to join. The only prerequisite is that you are an OASIS member. OASIS membership is $300 for individuals, and for companies has a sliding scale according to company size. More information on OASIS membership is here.

We have a lot of work to do, but now we finally have a place where we can get the work done. This is big. This is important, both for ODF vendors and ODF users. I hope you'll join us as we all work to improve interoperability among ODF implementations!

[Update: On 12 November Doug Mahugh accepted my invite and announced that Microsoft would join the TC.]

Labels: ,

Sunday, September 21, 2008

ODF: Translations and Errata

Although the ODF 1.0 standard was approved several years ago (by OASIS in 2005 and by ISO/IEC in 2006), work on the standard does not cease. Of course, we have work on technical revisions of ODF, in the form of ODF 1.1 and the current work on ODF 1.2. New releases make the news and are talked about at conferences. etc. But also important, though not talked about as much, is the ongoing work on the text of ODF 1.0., in the form of translation and error correction. Even after ODF 1.1 and ODF 1.2 are created, ODF 1.0 continues to be maintained.

Why is translation important? Aside from increasing the number of developers who can read the standard in their native language, translation is a prerequisite in several countries in order to make ODF into a national standard. So translation increases the number of places where ODF support can be an official requirement. So far the ODF 1.0 standard has been translated into Russian, Chinese, Spanish and Portuguese. (There may be others — Let me know if I've missed any.)

(Interesting to note the size advantage of ODF compared to OOXML. I've heard from one reliable source that to translate OOXML would cost $500,000. This will certainly hamper its ability to be adopted in some parts of the world. ODF, by reusing existing standards, is only 1/10 the size.)

Also in progress is a translation of ODF 1.0 into Japanese. From what I understand, a JISC committee has completed an initial pass of the translation and then passed the translation off to a second committee. This second committee is reviewing the translation and raising any issues where the text is unclear. In some cases this may be caused by a faulty translation. But in other cases errors may be found which were present in the original English text.

That's the second ongoing activity related to ODF 1.0 — error correction. Although we received most of our comments during the mandated 60-day public review prior to approval as an OASIS Standard, we do continue to get a trickle of comments months and years after publication. Each OASIS TC has their own mailing list for receiving comments. For the ODF TC, the mailing list archives are here. Anyone can subscribe to the comment list and post using the instructions here. The additional complexity in the sign-up procedure compared to your average mailing list is to ensure that all feedback submitted by the public to the list is in accordance with OASIS IPR rules. This helps ensure that ODF remains an open standard, unencumbered by patents.

Although we are only obligated to address comments received during the pre-approval public review period, around a year ago the ODF TC decided to formally record and process all comments received, regardless of when they arrived. So far, from May 2005 to the present, we've received around 250 comments. We note each comment in a spreadsheet, along with what ODF versions it pertains to (ODF 1.0, ODF 1.1 or ODF 1.2 draft), what section number the comment concerns, and whether the comment is reporting an editorial error, a technical error, or proposing a new feature. My estimate is that 50% of the comments are feature proposals, 40% are reporting editorial errors, and 10% reporting technical errors.

The preeminent source of comments on ODF 1.0 has been Murata Mokoto, of the Japanese SC34 mirror committee. Murata-san relays to us the defects found during the Japanese translation of ODF. The vast majority of these are editorial errors, mainly typographical or grammatical. But there are a handful of more significant issues found, and we are especially pleased to receive reports of these.

You may recall the old saying, "Every new class of users finds a new set of defects". Translation of a standard is a laborious process, especially when combined with the additional review step that JISC is engaging in. This has subjected the text of ODF 1.0 to more scrutiny, at a more detailed level, than any typical technical review could provide. So I am appreciative of the detailed comments from JISC, and of the effort made in this translation by them.

My personal aim is to ensure that all of the reported editorial errors are fixed in the ODF 1.2 text, and that any technical flaws are addressed via errata. An errata document (That's what we call it in OASIS. Others, e.g., ISO, call it "corrigenda") allows us to make small changes to the ODF 1.0 text to address defects.

But this goal certainly debatable. Why not aim to fix every reported error in ODF 1.0 via published errata? Why knowingly leave even the smallest typographical error in the text? What relative priority should be placed on fixing typographical errors (and others) in ODF 1.0 versus work completing ODF 1.2?

This is entirely at the will of the ODF TC. The combined priorities of the vendors and other interests represented on the committee determine the direction we take. My perception of the expressed interests is that we should address the JISC comments via an errata document, but that the overall priority is on completing the work on ODF 1.2, and not attempting to fix every last instance of subject/verb disagreement or misuse of "A" for "An" in ODF 1.0.

And so our work on the ODF TC follows that priority. I'd estimate that we spend 80% of our time on ODF 1.2 topics and 20% on processing public comments on ODF 1.0/1.1, including those from JISC. We are nearing completion of an official Errata document for ODF 1.0, consisting of fixes to defects reported by JISC. Expect to see a call for public review soon. After that, the TC will continue to review and process public comments from the comment mailing list. If warranted, we are able to issue an updated errata document in the future, to address additional issues as they are reported.

Labels:

Thursday, July 17, 2008

What is Rick smoking?

Former Microsoft consultant Rick Jelliffe has posted his own particular brand of science fiction/fantasy, this time in his favorite subgenre, a parody of a drug-induced psychosis, where after uneasy slumber Rick awakes in some alternate parallel universe and finds that JTC1/SC34 is open and transparent and OASIS is closed, and decides to write a rambling blog post about it.

If you like unintentional humor, you will enjoy reading Rick's over-the-top post.

Rick suggests that organizationally JTC1/SC34 is a more participatory environment for developing standards than OASIS.

JTC1's process, based on National Body voting is both effective ... and more genuinely open, because it is impossible to stack either directly or indirecty.

Let's test that proposition. Let's compare OASIS and JTC1/SC34.

Who can participate? In OASIS, anyone can participate, from any company, organization, government agency, non-profit corporation in the world. Or you can join as an unaffiliated individual, as many have. You don't need your government's permission to join. You just do it. Most join with a nominal membership fee ($300 for individuals) but membership grants are available in some cases, when the fee would be burden for active individual contributors.

What about participation in JTC1/SC34? First, you must be a member of your NB. How do you become a member of your NB? In the US the price is $1,200 and you must be representing a company or organization. Individuals? Sorry, you are not allowed to participate. In other countries the rules vary. In some cases membership is not available at all at any price. You are essentially wait-listed until an opening becomes available. (Sorry, we don't have enough seats, we heard in Portugal). In some countries, like China, membership is forbidden to native citizens who are employees of foreign subsidiaries in China. In other countries you can't join at all. It is entirely a government decision. So, good luck joining the NB of Syria, where the constitution has been suspended under emergency rule since 1963. (But somehow they managed to make time to vote on the OOXML ballot. Zimbabwe as well, that paragon of open participation.)

Now, it is entirely possible for a standards organization to appear open, but in practice to be inaccessible. So we must look at the complete cost of participation, not just the initial membership fees.

The OASIS ODF TC does its work entirely on an email list, a wiki, and via weekly phone calls, which are toll-free calls for most participants. I don't recall there ever being a face-to-face meeting, certainly not so long as I've been a member. This use of technology lowers the barrier to participation, so anyone can be effective on the TC if they wish. In particular it makes it easier for those who have day jobs and can only contribute to the mailing list during non-work hours.

What about JTC1/SC34? To participate effectively requires attendance at several international meetings each year (Plenary's, WG's, Ad-hocs, BRM's, etc.), as well as participation at NB meetings. Since many of the participants are representative of large corporations or government agencies, a junket mentality prevails and the meetings are often held in some of the most expensive places in the world: Geneva, Granada, London, Kyoto, Jeju Island, etc.

JTC1 does not allow meeting participation by telephone. Since important votes, are held at these meetings, and no provision is made for remote participation, one cannot effectively participate in JTC1/SC34 without a substantial budget for international travel. Attendance at a single meeting — the DIS 29500 BRM — was $3687.52 for me, and I flew coach and ate cheap. How many standards meetings like that can you as an individual or your small company afford per year?

Further, note the nature of your membership — what can you actually do? Can you vote? In OASIS, it is one person/one vote. In the TC, your vote as an individual with a $300 membership fee is counted exactly the same as my vote representing an OASIS Foundational Sponsor. At the organizational level, it is one company/one vote, and the smallest OASIS member organization has exactly the same vote as the largest.

In JTC1/SC34 however, you typically can't vote at all. NB's vote, not individuals, not companies. So your opinion and your wishes are subject to the will of your NB. If your opinion varies from your NB's, you may not be accredited to attend an international meeting, and even if you are able to attend you may not be allowed to speak your opinions. This extra level of indirection and censorship means that you, as an individual, can do little. And to the extent your NB's committee is stacked by a single vendor and their partner community, or your NB decides to overrule or ignore its technical committee, or Microsoft calls your head of state to change the NB's vote, or any of the dozens of other documented shenanigans that recently occurred, your entire membership fee and participation will be an entire waste of time, money and effort.

Membership is OASIS is far more open and inclusive. You join. You discuss. You vote. Period. In JTC1/SC34, you are mired in layers of bureaucracy at the national and international level, in a system crafted by and for the big boys to cut back room deals and manipulate the process to the benefit of large corporations.

(Now that isn't to say that there are not some individual consultants out there who thrive in the JTC1 environment by mastering its dark, dusty, demon-haunted hallways. Even the largest corporations occasionally have need of this expertise, as Rick and others are quite aware. If JTC1/SC34 were truly open and transparent, such skills would not be needed. You certainly don't see anyone selling their services to help companies navigate OASIS, do you?)

What about transparency? As Rick demonstrates, OASIS meeting minutes and agenda are all posted and public. So is our mailing list. So are all of our drafts. So is our member and public comments.

But in JTC1/SC34, most of the documents are private, only accessible to SC34 members by password. And then occasionally JTC1 will step in prevent SC34 from releasing their own work , suppressing documents even from their own SC members. There are no public comments to speak of, and member comments on draft standards are secret.

So when you are back from your "trip", Rick, please let us know again, who wins on openness, participation and transparency?




And for the record, a couple of outright deceptions in Rick's post:

Labels: , , ,

Tuesday, May 13, 2008

Spreadsheet file format performance

I've been doing some performance timings of file format support, comparing MS Office and OpenOffice. Most of the results are as expected, but some are surprising, and one in particular is quite disappointing.

But first, a little details of my setup. All timings, done by stopwatch, were from Office 2003 and OpenOffice 2.4.0 running on Windows XP, with all current service packs and patches. The machine is a Lenova T60p, dual-core Intel 2.16 Ghz and 2 GB of RAM. I took all the standard precautions -- disk was defragmented, and test files were confirmed as defragmented using contig. No other applications were running and background tasks were all shut down.

For test files, I went back to an old favorite, George Ou's (at the time with ZDNet) monster 50MB XLS file from his series of tests back in 2005. This file, although very large, is very simple. There are no formulas, indeed no formatting or styles. It is just text and numbers, treating a spreadsheet like a giant data table. So tests of this file will emphasize the raw throughput of the applications. Real world spreadsheets will typically be worse than this due to additional overhead from process styles, formulas, etc.

A test of a single file is not really that interesting. We want to see trends, see patterns. So I made a set of variations on George's original file, converting it into ODF, XLS and OOXML formats, as well as making scaled down versions of it. In total I made 12 different sized subsets of the original file, ranging down to a 437KB version, and created each file in all three formats. I then tested how long it took to load each file in each of the applications. In the case of MS Office, I installed the current versions of the translators for those formats, the Compatibility Pack for OOXML, and the ODF Add-in for the ODF support.

I find it convenient to report numbers per 100,000 spreadsheet cells. You could equally well use the original XLS spreadsheet size, or the number of rows of data, or any other correlated variable as the ordinate, but values per 100K cells is simple for anyone to understand.

I'll spare you all the pretty picture. If you want to make some, here is the raw data (CSV format). But I will give some summary observations.


For document sizes, the results are as follows:
So the XML formats are far smaller than the legacy binary format. This is due to the added Zip compression that both XML formats use. Also, note that the ODF files are significantly smaller than the OOXML files, less than 1/4 the size on average. Upon further examination, the XML document representing the ODF content is larger than the corresponding XML in OOXML, as expected, due to its use of longer, more descriptive markup tags. However the ODF XML compresses far better than the OOXML version, enough to overcome its greater verbosity and result in files smaller than OOXML. The compression ratio (original/zipped) for ODF's content.xml is 87, whereas the compression ratio for OOXML's sheet1.xml is only 12. We could just mumble something about entropy and walk away, but I think this area could bear further investigation.

Any ideas?

For load time, the times for processing the binary XLS files were:
Not too surprising. These binary formats are optimized for the guts of MS Office. We would expect them to load faster in their native application.

So what about the new XML formats? There has been recent talk about the "Angle Bracket Tax" for XML formats. How bad is it?
For typical sized documents, you probably will not notice the difference. However with the largest documents, like the 16-page, 3-million cells monster sheet, the OOXML document took 40 seconds to load in Office, the ODF sheet took 90 seconds to load in OpenOffice, whereas the XLS binary took less than 2 seconds to load in MS Office.

OK. So what are we missing. Ah, yes, ODF format in MS Office, using their ODF Add-in.
Yup. You read that right. To put this in perspective, let's look at a single test file, a 600K cells file, as we load it in the various formats and editors:

Can someone explain to me why Microsoft Office needs almost 10 minutes to load an ODF file that OpenOffice can load in 14 seconds?

(I was not able to test files larger than this using the ODF Add-in since they all crashed .)

(Update: Since it is the question everyone wants to know, the beta version of OpenOffice 3.0 opens the OOXML version of that file in 49.4 seconds and Sun's ODF Plugin for Microsoft Office loads this file in 30.03 seconds. )

This is one reason why I think file format translation is a poor engineering approach to interoperability. When OpenOffice wants to read an legacy XLS file, it does not approach the problem by translating the XLS into an ODF document and then loading the ODF file. Instead they simply load the XLS file, via a file filter, into the internal memory model of OpenOffice.

What is a file filter? It is like 1/2 of a translator. Instead of translating from one disk format to another disk format, it simply loads the disk format and maps it into an application-specific memory model that the application logic can operate directly on. This is far more efficient than translation. This is the untold truth that the layperson does not know. But this is how everyone does it. That is how we support formats in SmartSuite. That is how OpenOffice does it. And that is how MS Office does it for the file formats they care about. In fact, that is the way that Novell is now doing it now, since they discovered that the Microsoft approach is doomed to performance hell.

So it is with some amusement that I watch Microsoft and others propose translation as a solution to interoperability, creating reports about translation, even a proposal for a new work item in JTC1/SC34 concerning file format translation, when the single concrete attempt at translation is such an abysmal failure. It may look great on paper, but it is an engineering disaster. What customers need is direct, internal support for ODF in MS Office, via native code, in a file filter, not a translator that takes 10 minutes to load a file.

The astute engineer will agree with the above, but will also feel some discomfort at the numbers. There is more here than can be explained simply by the use of translators versus import filters. That choice might explain a 2x difference in performance. A particularly poor implementation might explain a 5x difference. But none of this explains why MS Office is almost 40x slower in processing ODF files. Being that much slower is hard to do accidentally. Other forces must be at play.

Any ideas?

Labels: , ,

Wednesday, May 07, 2008

Achieving the impossible



Unadulterated copy of James Clark's Relax NG validator jing. Unadulterated copy of Kohsuke Kawaguchi's Sun Multi-Schema Validator msv. Unadulterated copy of the ODF 1.0 Relax NG schema. Unadulterated copy of the ODF 1.0 Standard, in ODF format.

No errors from either validator.

msv is so good as to tell us "the document is valid". But jing indicates success with only silence. So will I.

Labels:

Monday, May 05, 2008

The Challenge

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
office:version="1.0">
<office:body>
<office:text>
<text:p>Dear Alex Brown. Please prove that I am invalid ODF 1.0 (ISO 26300:2006). I do not think that I am. In fact I think that your statement that there are no valid ISO ODF documents in the world, and that there cannot be, is a brash, irresponsible and indefensible piece of bombast that you should retract.</text:p>
<text:p>(Please note that this document contains no ID, IDREF or IDREFS attributes. Nor does it contain custom content.)</text:p>
</office:text>
</office:body>
</office:document-content>

Labels: ,

Friday, May 02, 2008

ODF Validation for Dummies

[Updated 4 May 2008, with additional rebuttal at the end]

Alex Brown has a problem. He can't figure out how to validate ODF documents. Unfortunately, when he couldn't figure it out, he didn't ask the OASIS ODF TC for help, which would have been the normal thing to do. Indeed, the ODF TC passed a resolution back in February 2007 that said, in part:
That the ODF TC welcomes any questions from ISO/IEC JTC1/SC34 and
member NB's regarding OpenDocument Format, the functionality it
describes, the planned evolution of this standard, and its relationship
to other work on the technical agenda of JTC1/SC34. Questions and
comments can be directed to the TC chair and secretary whose email
addresses are given at

http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office

or through the comments facility at

http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office

So it is rather uncollegial of Alex to refuse such an open, transparent way of getting his questions answered. But Alex didn't avail himself of that avenue. He just assumed if he couldn't figure out how to validate ODF then it simply couldn't be done, and that ODF was to blame. This is presumptuous. Does he think that in the three years since ODF 1.0 became a standard, that no one has tried to validate a document?

Alex is so sure of himself that he publicly exults on the claimed significance of his findings:

  • For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF.
  • Even if the schema is fixed, we can see that OpenOffice.org 2.4.0 does not produce valid XML documents. This is to be expected and is a mirror-case of what was found for MS Office 2007: while MS Office has not caught up with the ISO standard, OpenOffice has rather bypassed it (it aims at its consortium standard, just as MS Office does).
I think you agree that these are bold pronouncements, especially coming from someone so prominent in SC34, the Convenor of the ill-fated OOXML BRM, someone who is currently arguing that SC34 should own the maintenance of OOXML and ODF, indeed someone who would be well served if he could show that all consortia standards are junk, and that only SC34 (and he himself) could make them good.

Of course, I've been known to pontificate as well. There is nothing necessarily wrong with that. The difference here is that Alex Brown is totally wrong.

But let's see if we can help show Alex, or anyone else similarly confused, the correct way to validate an ODF document.

First start with an ODF document. When Alex tested OOXML, he used the Ecma-376 OOXML specification. Let's do the analogous test and validate the ODF 1.0 text. You can download it from the OASIS ODF web site. You'll want this version of the text, ODF 1.0 (second edition), which is the source document for the ISO version of ODF.

You'll also want to download the Relax NG schema files for OASIS ODF 1.0, which you can download in two pieces: the main schema, and the manifest schema.

Next you'll need to get a Relax NG validator. Alex recommends James Clark's jing, so we'll use that. I downloaded jing-20030619.zip the main distribution for use with the Java Runtime Environment. Unzip that to a directory and we're almost there.

Since jing operates on XML files and knows nothing about the Zip package structure of an ODF file, you'll need to extract the XML contents of the ODF file. There are many ways to do this. My preference, on Windows, is to associate WinZip with the ODF file extensions (ODT, ODS and ODP) so I can right-click on these files unzip them. When you unzip you will have the following XML files, along with directories for images files and other non-XML resources you can ignore:
So now we're ready to validate! Let's start with content.xml. The command line for me was:

java -jar c:/jing/bin/jing.jar OpenDocument-schema-v1.0-os.rng content.xml

(Your command may vary, depending on where you put jing, the ODF schema files and the unzipped ODF files)

The result is a whole slew of error messages:

C:\temp\odf\OpenDocument-schema-v1.0-os.rng:17658:18: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"
C:\temp\odf\OpenDocument-schema-v1.0-os.rng:10294:22: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"


Oh no! Emergency, emergency, everyone to get from street!

I wonder if this is one of the things that tripped Alex up? Take a deep breath. These in fact are not Relax NG (ISO/IEC 19757-2) errors at all, but errors generated by jing's default validation of a different set of constraints, defined in the Relax NG DTD Compatibility specification which has the status of a Committee Specification in OASIS. It is not part of ISO/IEC 19757-2.

Relax NG DTD Compatibility provides three extensions to Relax NG: default attribute values, ID/IDREF constaints and a documentation element. The Relax NG DTD Compatibility specification is quite clear in section 2 that "Conformance is defined separately for each feature. A conformant implementation can support any combination of features." And in fact, ODF 1.0, in section 1.2 does just that: "The schema language used within this specification is Relax-NG (see [RNG]). The attribute default value feature specified in [RNG-Compat] is used to provide attribute default values".

It is best to simple disable the checking of Relax NG DTD Compatibility constraints by using the documented "-i" flag in jing. If you want to validate ID/IDREF cross-references, then you'll need to do that in application code, and not using jing in Relax NG DTD Compatibility mode. Note that jing was not complaining about any actual ID/IDREF problem in the ODF document.

So, false alarm. You can walk safely on the streets now.

(That said, if we can make some simple changes to the ODF schemas that will allow it to work better with the default settings of jing, or other popular tools, then I'm certainly in favor of that. Alex's proposed changes to the schema are reasonable and should be considered.)

So, let's repeat the validation with the -i flag:

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng content.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng styles.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng meta.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng settings.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-manifest-schema-v1.0-os.rng META-INF/manifest.xml

Zero errors, zero warnings.

So, there you have it, an example that shows that there is at least one document in the universe that is valid to the ODF 1.0 schema, disproving Alex's statement that "there are no XML documents in existence which are valid to ISO ODF."

The directions are complete and should allow anyone to validate the ODF 1.0 specification, or any other ODF 1.0 document. Now that we have the basics down, let's work on some more advanced topics.

First, the reader should note that there are two versions of the ODF schema, the original 1.0 from 2005, and the updated 1.1 from 2007. (This is also a third version underway, ODF 1.2, but that needn't concern us here.)

An application, when it creates an ODF document, indicates which version of the ODF standard it is targeting. You can find this indication if you look at the office:version attribute on the root element of any ODF XML file. The only values I would expect to see in use today would be "1.0" and "1.1". Eventually we'll also see "1.2".

It is important to use the appropriate version of the ODF schema to validate a particular document. Our goal, as we evolve ODF, is that an application that knows only about ODF 1.0 should be able to adapt and "degrade gracefully" when given an ODF 1.1 document, by ignoring the features it does not understand. But an application written to understand ODF 1.1 should be able to fully understand ODF 1.0 documents without any additional accommodation.

Put differently, from the document perspective, a document that conforms to ODF 1.0 should also conform to ODF 1.1. But the reverse direction is not true.

To accomplish this, as we evolve ODF, within the 1.x family of revisions, we try to limit ourselves to changes that widen the schema constraints, by adding new optional elements, or new attribute values, or expanding the range of values permitted. Constraint changes that are logically narrowing, like removing elements, making optional elements mandatory, or reducing the range of allowed values, would break this kind of document compatibility.

Now of course, at some point we may want to make bolder changes to the schema, but this would be in a major release, like a 2.0 version. But within the ODF 1.x family we want this kind of compatibility.

The net of this is, an ODF 1.1 document should only be expected to be valid to the ODF 1.1 schema, but an ODF 1.0 document should be valid to the ODF 1.0 and the ODF 1.1 schemas.

That's enough theory! Let's take a look now at the test that Alex actually ran. It is a rather curious, strangely biased kind of test, but the bad thinking is interesting enough to devote some time to examine in some detail.

When he earlier tested OOXML, Alex used the OOXML standard itself, a text on which Microsoft engineers had lavished many person-years of attention for the past 18 months, and he validated it with the current version of the OOXML schema. That is pretty much the best case, testing a document that has never been out of Microsoft's sight for 18 months and testing it with the current version of the schema. I would expect that this document would have been a regular test case for Microsoft internally, and that its validity has been repeatedly and exhaustively tested over the past 18 months. I know that I personally tested it when Ecma-376 was first released, since it was the only significant OOXML document around. So, essentially Alex gave OOXML the softest of all soft pitches.

I think Microsoft's response, that the validity errors detected by Alex are due to changes made to the schema at the BRM, is a reasonable and accurate explanation. The real story on OOXML standardization is not how many changes were made that were incompatible with Office 2007, but how few. It appears that very few changes, perhaps only one, will be required to make Office 2007's output be valid OOXML.

So when testing ODF, what did Alex do? Did he use the ODF 1.0 specification as a test case, a document that the OASIS TC might have had the opportunity to give a similar level of attention to? No, he did not, although that would have validated perfectly, as I've demonstrated above. Instead, Alex uses the OOXML specification, a document which by his own testing is not valid OOXML, then converts it into the proprietary .DOC binary format, then translates that binary format into ODF and then tries to validate the results with the ODF 1.0 schema (i.e., the wrong version of the ODF schema since OpenOffice 2.4.0's output is clearly declared as ODF 1.1), and then applies a non-applicable, non-standard DTD Compatibility constraint test during the Relax NG validation.

Does anyone see something else wrong with this testing methodology?

Aside from the obvious bias of using an input document that Microsoft has spent 18 months perfecting, and using the wrong schemas and validator settings, there is another, more subtle problem.

Alex's test of OOXML and ODF are testing entirely different things. With OOXML, he took a version N (Ecma-376) OOXML document and tried to validate it with a version N+1 (ISO/IEC 29500) version of the OOXML schema.

But what he did with ODF was take a version N+1 (ODF 1.1) document and tried to validate it with an version N (ODF 1.0) of the ODF schema.

These are entirely different operations. One test is testing the backwards compatibility of the schema, the other is testing the backwards compatibility of document instances. It takes no genius to figure out that if ODF 1.1 adds new elements, then an ODF 1.1 document instance will not validate with the ODF 1.0 schema. We don't ordinarily expect backwardly compatible validity of document instances. Again, Alex's tests are biased in OOXML's favor, giving ODF a much more difficult, even impossible task, compared the the versions ran for OOXML.

If we want to compare apples to apples, it is quite easy to perform the equivalent test with ODF. I gave it a try, taking a version N document (the ODF 1.0 standard itself, per above) and validated it with the version N+1 schema (ODF 1.1 in this case). It worked perfectly. No warnings, no errors.

In any case, in his backwards test Alex reports 7,525 errors, "mostly of the same type (use of an undeclared soft-page-break element)" when validating the OOXML text with ODF 1.0 schema. Indeed, all but 39 of these errors are reports of soft-page-break.

Soft page breaks are a new feature introduced in ODF 1.1. It has two primary advantages for accessibility. First it allows easier collaboration between people using different technologies to read a document. Not all documents are deeply structured, with formal divisions like section 3.2.1, etc. Most business documents are loosely structured, and collaboration occurs by referring to "2nd paragraph on page 23" or "the bottom of page 18". But when using different assistive technologies, from larger fonts, to braille, to audio renderings, the page breaks (if the assistive technology even has the concept of a page break) are usually located differently from the page breaks in the original authoring tool. This makes collaboration difficult. So, ODF 1.1 added the ability for applications to write out "soft" page breaks, indicating where the page breaks occurred when the original source document was saved.

Although this feature was added for accessibility reasons, like curb cuts, its likely future applications are more general. We will all benefit. For example, a convertor for translating from ODF to HTML would ordinarily only be able to calculate the original page breaks by undertaking complex layout calculations. But with soft page breaks recorded, even a simple XSLT script can use this information to insert indications of page breaks, or to generate accurate page numbering, etc. Although the addition of this feature hinders Alex's idiosyncratic attempt to validate ODF 1.1 documents with the ODF 1.0 schema, I think the fact that this feature helps blind and visually impaired users, and generally improves collaboration makes it a fair trade-off.

Wouldn't you agree?

That leaves 39 validation errors in Alex's test. 12 of them are reports of invalid values in an xlink:href attribute value. This appears to be an error in the original DOCX file. Garbage In, Garbage Out. For example, in one case the original document has HYPERLINK field that contains a link to content in Microsoft's proprietary CHM format (Compiled HTML). The link provided in the original document does not match the syntax rules required for an XML Schema anyURI (the URL ends with "##" rather than "#") Maybe it is correct for markup like this, with non-standard, non-interoperable URI's, to give validation errors. This is not the first time that OOXML has been found polluting XML with proprietary extensions. But realize that OpenOffice 2.4.0 did not create this error. OpenOffice is just passing the error along, as Office 2007 saved it. It is interesting to note that this error was not caught in MS Office, and indeed is undetectable with OOXML's lax schema. But the error was caught with the ODF schema. This is a good thing, yes? It might be a good idea for OpenOffice to add an optional validation step after importing Microsoft Office documents, to filter out such data pollution.

For the remaining validation errors, they are 27 instances of style:with-tab. Honestly, I have no explanation for this. This attribute does not exist in ODF 1.0 or ODF 1.1. That it is written out appears to be a bug in OpenOffice. Maybe someone there can tell us why the story is on this? But I don't see this problem in all documents, or even most documents.

For fun I tried processing this OOXML document another way. Instead of the multi-hop OOXML-to-DOC-to-ODF conversion Alex did, why not go directly from OOXML to ODF in one step, using the convertor that Microsoft/CleverAge created? This should be much cleaner, since it doesn't have all the legacy code or messiness of the binary formats or legacy application code. It is just a mapping from one markup to another markup, written from scratch. Getting the output to be valid should be trivial.

So I download the "OpenXML/ODF Translator Command Line Tools" from SourceForge. According to their web page, this tool targets ODF 1.0, so we'll be validating against the ODF 1.0 schemas.

This tool is very easy to use once you have the .NET prerequisites installed. The command line was:

odfconvertor /I "Office Open XML Part 4 - Markup Language Reference.docx"

The convertor then chugs along for a long, long, long time. I mean a long time. The conversion from OOXML to ODF eventually finished, after 11 hours, 10 minutes and 41 seconds! And this was on a Thinkpad T60p with dual-core Intel 2.16Ghz processor and 2.0 GB of RAM.

I then rang jing, using the validation command lines from above. It reported 376 validation errors, which fell into several categories:

In any case, not a lot of errors, but a handful of errors repeated. But it is surprising to see that this single-purpose tool, written from scratch, had more validation errors in it than OpenOffice 2.4.0 does.

In the end we should put this in perspective. Can OpenOffice produce valid ODF documents? Yes, it can, and I have given an example. Can OpenOffice produce invalid documents? Yes, of course. For example when it writes out a .DOC binary file, it is not even well-formed XML. And we've seen one example, where via a conversion from OOXML, it wrote out an ODF 1.1 document that failed validation. But conformance for an application does not require that it is incapable of writing out an invalid document. Conformance requires that it is capable of writing out a valid document. And of course, success for an ODF implementation requires that its conformance to the standard is sufficient to deliver on the promises of the standard, for interoperability.

It is interesting to recall the study that Dagfinn Parnas did a few years ago. He analyzed 2.5 million web pages. He found that only 0.7% of them were valid markup. Depending on how you write the headlines, this is either an alarming statement on the low formal quality of web content, or a reassuring thought on the robustness of well-designed applications and systems. Certainly the web seems to have thrived in spite of the fact that almost every web page is in error according to the appropriate web standards. In fact I promise you that the page you are reading now is not valid, and neither is Alex Brown's, nor SC34's, nor JTC1's, nor Ecma's, nor ISO's, nor the IEC's.

So I suggest that ODF has a far better validation record than HTML and the web have, and that is an encouraging statement. In any case, Alex Brown's dire pronouncements on ODF validity have been weighed in the balance and found wanting.


4 May 2008

Alex has responded on his blog with "ODF validation for cognoscneti". He deals purely with the ID/IDREF/IDREFS questions in XML. He does not justify his biased and faulty testing methodology, not does he reiterate his bold claims that there are no valid ODF 1.0 documents in existence.

Since Alex's blog does not seem to be allowing me to comment, I'll put here what I would have put there. I'll be brief because I have other fish to fry today.

Alex, no one doubts that ID/IDREF/IDREFS constraints must be respected by valid ODF document instances. I never suggested otherwise. But what I do state is that this is not a concern of a Relax NG validator. You can read James Clark saying the same thing in his 2001 "Guidelines for using W3C XML Schema Datatypes with RELAX NG", which says in part:

The semantics defined by [W3C XML Schema Datatypes] for the ID, IDREF and IDREFS datatypes are purely lexical and do not include the cross-reference semantics of the corresponding [XML 1.0] datatypes. The cross-reference semantics of these datatypes in XML Schema comes from XML Schema Part 1. Furthermore, the [XML 1.0] cross-reference semantics of these datatypes do not fit into the RELAX NG model of what a datatype is. Therefore, RELAX NG validation will only validate the lexical aspects of these datatypes as defined in [W3C XML Schema Datatypes].

Validation of ID/IDREF/IDREFS cross-reference semantics is not the job of Relax NG, and you are incorrect to suggest otherwise. Your logic is also deficient when you take my statement of that fact and derive the false statement that I believe that ID/IDREF semantics do not apply to ODF. One does not follow from the other.

You know, as much as anyone, that conformance is a complex topic. One does not ordinarily expect, except in trivial XML formats, that the complete set of conformance constraints will be expressed in the schema. Typically a multi-layered approach is used, with some syntax and structural constraints expressed in XML Schema or Relax NG, some business constraints in Schematron, and maybe even some deeper semantic constraints that are expressed only in the text of the standard and can only be tested by application logic.

For example, a document that defines a cryptographic algorithm might need to store a prime number. The schema might define this as an integer. The fact that the schema does not state or guarantee that it is a prime number is not the fault of the schema. And the inability of a Relax NG validator to test primality is not a defect in Relax NG. The primality test would simply need to be carried out at another level, with application logic. But the requirement for primality in document instances can still be a conformance requirement and it is still testable, albeit with some computational effort, in application logic.

I believe that is the source of your confusion. The initial errors you saw when running jing with the Relax NG DTD Compatibility flag enabled were not errors in the ODF document instances. What you saw was jing reporting that it could not apply the Relax NG DTD Compatibility ID/IDREF/IDREFS constraint checks using the ODF 1.0 schema. That in no way means that the constraints defined in XML 1.0 are not required on ODF document instances. It simply indicates that you would need to verify these constraints using means other than Relax NG DTD Compatibility.

So I wonder, have you actually found ODF document instances, say written from OpenOffice 2.4.0, which have ID/IDREF/IDREFS usage which violates the constraints expressed in ODF 1.0?

Finally, in your professional judgment, do you maintain that this is a accurate statement: "For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF."

Labels:

Wednesday, April 16, 2008

Suggesting ODF Enhancements

There is a good post by Mathias Bauer on Sun Hamburg's GullFOSS blog. He deals with the practical importance of OASIS's "Feedback License" that governs any public feedback OASIS receives from non-TC members.

The ODF TC receives ideas for new features from many places. Many of the ideas come from our TC members themselves, where we have representation from most of the major ODF vendors, from open source projects, interest groups, as well as from individual contributors.

Other ideas come from other vendors or open source projects, from organizations that the TC has a liaison relationship with (like ISO/IEC JTC1/SC34), or individual members of the public.

Contributions from OASIS TC members are already covered by the OASIS IPR Policy. The TC member who contributes written proposals to the TC is obliged from the time of contribution. And other TC members are obliged if they have been TC members for at least 60 days and remain a member 7 days after approval of any Committee Draft. You can see the participation status of TC members here.

For everyone else, those who are not members of the ODF TC, the rules require that proposals, feedback, comments, ideas, etc., come through our comment mailing list. But before you can post to the comment list you must first accept the terms of the Feedback License.

Is this extra step annoying? Yes, it is. But this pain is what is necessary to keep our IP pedigree clean and protect the rights of everyone to implement and use ODF. It is part of the price we pay for open standards. Free does not mean free from vigilance.

One of my responsibilities on the ODF TC is to monitor and process the public comments we receive. Regretfully this is a duty which I've neglected for too long. So I spent some time this week getting caught up on the comments, entering them all into a tracking spreadsheet. We have a total of 180 public comments since ODF 1.0 was approved by OASIS, covering everything from new feature proposals to reports of typographical errors.

The largest single source of comments is from the Japanese JTC1/SC34 mirror committee, where they have been translating the ODF 1.0 standard into Japanese. As you know, you will get no closer reading of a text than when attempting translation, so we're glad to receive this scrutiny. I'll look forward to adding the Japanese translation of ODF along side the existing Russian and Chinese translations soon.

For comments that are in the nature of a defect report, i.e., reporting an editorial or technical error in the standard, we will include a fix in the ODF 1.0 errata document we are preparing. For comments that are in the nature of a new feature proposal, we will discuss on a TC call, and decide whether or not to include it in ODF 1.2.

A sample of some of the feature proposals from the comment list are:
If you have any other ideas for ODF enhancements, or thoughts on the above proposals, please don't post a response to this blog! Remember, you need to use the comment list for your feedback to be considered by the OASIS ODF TC.

Of course, general comments are always welcome on this blog.

Labels:

Saturday, February 16, 2008

Fast Track versus PAS

Years ago I read an interesting article about the encyclopedia entry for the keyword "Longitude". According to the article, the entry merely said "See Latitude". With that short, two-word sentence the encyclopedia author conflated these two concepts as mere orthogonal dimensions, lumped together, each as boring as the other. This ignored the fact that latitude is boring, easy, trivial, known to the ancients and as easy to calculate as measuring the altitude of Polaris. But longitude, there lies an epic adventure, something fiendishly difficult to calculate accurately, something that propelled a great seafaring nation to a search for accurate timepieces that would work at sea, just in order to more accurately calculate longitude. Books have been written about longitude, lives lost, fortunes made. But latitude -- latitude is for children.

So when I hear people lump Fast Track and PAS process in JTC1 together, I roll my eyes and think... If only they knew how different they really are.

Let's give it a try, starting with PAS.

PAS stands for "Publicly Available Specification" and the PAS process in JTC1 allows an existing standard from outside of JTC1 to be submitted, reviewed and approved in an accelerated review cycle. An organization that wishes to make a PAS submission (typically a standards consortium) must first seek recognition as a PAS Submitter. This requires that they submit to JTC1 for approval a list of standards they wish to submit, as well as documentation that explains their organizational qualifications. The long list of organizational acceptance criteria are outlined in JTC1 Directives, Annex M:

M7.3 Organisation Acceptance Criteria

M7.3.1 Co-operative Stance (M)

There should be evidence of a co-operative attitude toward open dialogue, and a stated objective of pursuing standardisation in the JTC 1 arena. The JTC 1 community will reciprocate in similar ways, and in addition, will recognise the organisation's contribution to international standards.

It is JTC 1's intention to avoid any divergence between the JTC 1 revision of a transposed PAS and a version published by the originator. Therefore, JTC 1 invites the submitter to work closely with JTC 1 in revising or amending a transposed PAS.

There should be acceptable proposals covering the following categories and topics.

M.7.3.1.1 Commitment to Working Agreement(s)
  1. What working agreements have been provided, how comprehensive are they?
  2. How manageable are the proposed working agreements (e.g. understandable, simple, direct, devoid of legalistic language except where necessary)?
  3. What is the attitude toward creating and using working agreements?
M.7.3.1.2 Ongoing Maintenance
  1. What is the willingness and resource availability to conduct ongoing maintenance, interpretation, and 5 year revision cycles following JTC 1 approval (see also M6.1.5)?
  2. What level of willingness and resources are available to facilitate specification progression during the transposition process (e.g. technical clarification and normal document editing)?

M.7.3.1.3 Changes during transposition
  1. What are the expectations of the proposer toward technical and editorial changes to the specification during the transposition process?
  2. How flexible is the proposing organisation toward using only portions of the proposed specification or adding supplemental material to it?

M.7.3.1.4 Future Plans
  1. What are the intentions of the proposing organisation toward future additions, extensions, deletions or modifications to the specification? Under what conditions? When? Rationale?
  2. What willingness exists to work with JTC 1 on future versions in order to avoid divergence? Note that the answer to this question is particularly relevant in cases where doubts may exist about the openness of the submitter organisation.
  3. What is the scope of the organisation activities relative to specifications similar to but beyond that being proposed?

M7.3.2 Characteristics of the Organisation (M)

The PAS should have originated in a stable body that uses reasonable processes for achieving broad consensus among many parties. The PAS owner should demonstrate the openness and non-discrimination of the process which is used to establish consensus, and it should declare any ongoing commercial interest in the specification either as an organisation in its own right or by supporting organisations such as revenue from sales or royalties.

M.7.3.2.1 Process and Consensus:
  1. What processes and procedures are used to achieve consensus, by small groups and by the organisation in its entirety?
  2. How easy or difficult is it for interested parties, e.g. business entities, individuals, or government representatives to participate?
  3. What criteria are used to determine "voting" rights in the process of achieving consensus?

M.7.3.2.2 Credibility and Longevity:
  1. What is the extent of and support from (technical commitment) active members of the organisation? b) How well is the organisation recognised by the interested/affected industry?
  2. How long has the organisation been functional (beyond the initial establishment period) and what are the future expectations for continued existence?
  3. What sort of legal business entity is the organisation operating under?

M7.3.3 Intellectual Property Rights: (M)

The organisation is requested to make known its position on the items listed below. In particular, there shall be a written statement of willingness of the organisation and its members, if applicable, to comply with the ISO/IEC patent policy in reference to the PAS under consideration.

Note: Each JTC 1 National Body should investigate and report the legal implications of this section.

M.7.3.3.1 Patents:
  1. How willing are the organisation and its members to meet the ISO/IEC policy on these matters?
  2. What patent rights, covering any item of the proposal, is the PAS owner aware of?

M.7.3.3.2 Copyrights:
  1. What copyrights have been granted relevant to the subject specification(s)?
  2. What copyrights, including those on implementable code in the specification, is the PAS originator willing to grant?
  3. What conditions, if any, apply (e.g. copyright statements, electronic labels, logos)?
M.7.3.3.3 Distribution Rights:
  1. What distribution rights exist and what are the terms of use?
  2. What degree of flexibility exists relative to modifying distribution rights; before the transposition process is complete, after transposition completion?
  3. Is dual/multiple publication and/or distribution envisaged, and if so, by whom?

M.7.3.3.4 Trademark Rights:
  1. What trademarks apply to the subject specification?
  2. What are the conditions for use and are they to be transferred to ISO/IEC in part or in their entirety?

M.7.3.3.5 Original Contributions:
  1. What original contributions (outside the above IPR categories) (e.g. documents, plans, research papers, tests, proposals) need consideration in terms of ownership and recognition?
  2. What financial considerations are there?
  3. What legal considerations are there?
Once this documentation is provided, a three-month JTC1 ballot is held on the question of whether to approved the applicant as a Recognized PAS Submitter. If approved, this status last for 2 years, but may be renewed by reapplying with updated organizational documentation. Renewals must also be approved by a 3-month letter ballot.

Once an organization has Recognized PAS Submitter status, it may now propose a PAS submission. Such a submission must be within scope of the Submitter's original application, and must be accompanied by an Explanatory Report that speaks to JTC1's strategic interests in Interoperability, Cultural and Linguistic Adaptability, as well as the following document-related acceptance criteria:

M7.4 Document Related Criteria

M7.4.1 Quality

Within its scope the specification shall completely describe the functionality (in terms of interfaces, protocols, formats, etc) necessary for an implementation of the PAS. If it is based on a product, it shall include all the functionality necessary to achieve the stated level of compatibility or interoperability in a product independent manner.

M.7.4.1.1 Completeness (M):
  1. How well are all interfaces specified?
  2. How easily can implementation take place without need of additional descriptions?
  3. What proof exists for successful implementations (e.g. availability of test results for media standards)?

M.7.4.1.2 Clarity:
  1. What means are used to provide definitive descriptions beyond straight text?
  2. What tables, figures, and reference materials are used to remove ambiguity?
  3. What contextual material is provided to educate the reader?

M.7.4.1.3 Testability (M)

The extent, use and availability of conformance/interoperability tests or means of implementation verification (e.g. availability of reference material for magnetic media) shall be described, as well as the provisions the specification has for testability.

The specification shall have had sufficient review over an extended time period to characterise it as being stable.

M.7.4.1.4 Stability (M):
  1. How long has the specification existed, unchanged, since some form of verification (e.g. prototype testing, paper analysis, full interoperability tests) has been achieved?
  2. To what extent and for how long have products been implemented using the specification?
  3. What mechanisms are in place to track versions, fixes, and addenda?

M.7.4.1.5 Availability (M):
  1. Where is the specification available (e.g. one source, multinational locations, what types of distributors)?
  2. How long has the specification been available?
  3. Has the distribution been widespread or restricted? (describe situation)
  4. What are the costs associated with specification availability?

M7.4.2 Consensus (M)

The accompanying report shall describe the extent of (inter)national consensus that the document has already achieved.

M.7.4.2.1 Development Consensus:
  1. Describe the process by which the specification was developed.
  2. Describe the process by which the specification was approved.
  3. What "levels" of approval have been obtained?

M.7.4.2.2 Response to User Requirements:
  1. How and when were user requirements considered and utilised?
  2. To what extent have users demonstrated satisfaction?

M.7.4.2.3 Market Acceptance:
  1. How widespread is the market acceptance today? Anticipated?
  2. What evidence is there of market acceptance in the literature?

M.7.4.2.4 Credibility:
  1. What is the extent and use of conformance tests or means of implementation verification?
  2. What provisions does the specification have for testability?

M7.4.3 Alignment

The specification should be aligned with existing JTC 1 standards or ongoing work and thus complement existing standards, architectures and style guides. Any conflicts with existing standards, architectures and style guides should be made clear and justified.

M.7.4.3.1 Relationship to Existing Standards:

  1. What international standards are closely related to the specification and how?
  2. To what international standards is the proposed specification a natural extension?
  3. How is the specification related to emerging and ongoing JTC 1 projects?

M.7.4.3.2 Adaptability and Migration:

  1. What adaptations (migrations) of either the specification or international standards would improve the relationship between the specification and international standards?
  2. How much flexibility do the proponents of the specification have?
  3. What are the longer-range plans for new/evolving specifications?

M.7.4.3.3 Substitution and Replacement:
  1. What needs exist, if any, to replace an existing international standard? Rationale?
  2. What is the need and feasibility of using only a portion of the specification as an international standard?
  3. What portions, if any, of the specification do not belong in an international standard (e.g. too implementation specific)?

M.7.4.3.4 Document Format and Style
  1. What plans, if any, exist to conform to JTC 1 document styles?

The Explanatory Report also sets the maintenance regime for the submission, if approved

The proposed standard, along with the Explanatory Report is then distributed to JTC1 NB's for a 6-month ballot. Approval criteria is 2/3 approval of voting P-members, and no more than 25% disapproval in total. At the end of the ballot a Ballot Resolution Meeting may be held if needed.

So, that is PAS process, in brief. PAS process is how ODF was approved back in 2006, with OASIS as the Recognized PAS Submitter.

Fast Track process, is almost the same from the time the ballot is issued. The six-month period is split into a 30-day "contradiction period" and a 5-month ballot. (That is an odd difference, with no clear reason). But the voting criteria, the BRM process, etc., this is all the same between the two. What is different (and there are critical differences) is everything that happens before the ballot.

Who can submit a Fast Track? Any JTC1 P-member, or any Class A Liaison can propose a Fast Track.

We all know about P-members. They are NB's, typically the highest standardization committee in any country. A P-member used to also mean that you had a broad interest in many or most JTC1 matters. But now it may mean merely that Microsoft asked you to join as a P-member.

Class A Liaison are "Organisations which make an effective contribution to and participate actively in the work of JTC 1 or its SCs for most of the questions dealt with by the committee". Any organization can apply to be a Class A Liaison and be voted in via a letter ballot or at a meeting. There are no formal organization qualifications, no requirement to state an interest in eventually making Fast Tracks, or to answer any of the types of questions that PAS Submitters must answer.

Further, once approved as a Class A Liaison, the status lasts forever. There is no requirement to renew or reapply. In fact JTC1 Directives even lack a documented procedure for removing a Class A Liaison.

So what about the proposals for Fast Track submission. What is required of them? No Explanatory Report is required. No checklist of document-related criteria must be answered. JTC1 Directives say merely "The criteria for proposing an existing standard for the fast-track procedure is a matter for each proposer to decide." That's it. It is at the sole discretion of the Class A Liaison.

So you can see what great power Ecma has over JTC1 -- they can submit any standard they want for Fast Track, and no one in JTC1 can stop them, or even remove their right to submit more Fast Tracks.

This may explain why Ecma is able to command such high membership fees. A full voting membership in OASIS, which would allow a company to help produce an OASIS Standard for later submission to JTC1 under the arduous PAS process, this costs $1,100 for a small company. To join the US NB and be able to lobby for a Fast Track submission from the US, this will cost you $9,500. But to join Ecma as a voting member (what they call an "Ordinary Member") this will cost you 70,000 Swiss Francs, or $64,000. That is what no-questions-asked Fast Track service is worth. I think that, from Microsoft's perspective, the extra $62,900 is money well spent. But what about from JTC1's perspective? They don't get this extra money. So what's their excuse for having these permissive Fast Track procedures that give Ecma so much control?

In any case, that is why I roll my eyes when people lump PAS and Fast Track together, and say that they are essentially the same process. They clearly aren't. PAS Submitters like OASIS are given intense scrutiny, and are required to document in great detail how their organization and their proposals meet JTC1 criteria. The scrutiny never ends, as a new Explanatory Report is required for every submission, and their status as Recognized PAS Submitter only lasts for a few years before requiring re-approval.

Fast Track submitters, as Class A Liaisons, on the other hand, are the monarchs of JTC1. They serve for life and are answerable to no one. They can submit a Fast Track on any subject they want, at any time. So a standards consortium like Ecma, with primary expertise in optical disk standards, but never having produced an XML standard before, can rubber stamp the world's largest XML standard and submit it for Fast Track processing to JTC1. And no one can do a thing about it.

Labels: ,

Tuesday, February 12, 2008

Punct Contrapunct

The recent Burton Group report, What's Up, .DOC? by Guy Creese and Peter O'Kelly was made available free to the public for a stated purpose:

We’ve made the overview available for free (I must admit I'm not sure for how long), as we believe this topic warrants expanded industry debate before a February, 2008 ISO ballot on OOXML, and we want to help catalyze and advance the debate.

The degree of expanded debate achieved may be estimated by noting that Microsoft is sending this report to every JTC1 national body involved in the OOXML ballot, from Pakistan to Ecuador, and has invited Peter O'Kelly to speak on this paper both at the recent OOXML press event in Washington as well as this week's Office Developers Conference.

Much could be said of this report, but I'll limit myself to commenting on a single passage:

[S]everal vendors interviewed for this overview indicated that it's essentially impossible to get ODF proposals approved if they're not also supported in OpenOffice.org, and further noted that Sun closely controls OpenOffice.org (much as it also holds control over Java).

It should be noted that, before making this statement, the authors neither contacted OASIS nor the OASIS ODF TC in order to check their facts.

The ODF Alliance published a rebuttal of this report, and in particular took umbrage at that passage, saying:

This is demonstrably false, and the use of unnamed “vendors” as sources does not eliminate the need for doing basic fact checking on such claims. Rumors and innuendo do not objective analysis make.

First, on the control aspect, note that ODF 1.0, the standard, is owned and controlled by OASIS, a standards consortium of over 600 member organizations. Sun is just one company among many members. Indeed, for most of the development of ODF, Microsoft was on the Board of Directors of OASIS.

Second, OASIS is a corporation. It is legally bound to its Bylaws. There is no arbitrary control by member corporations.

The ODF TC is co-chaired by an IBM employee and a Sun employee, and is regulated by the OASIS TC Process document, which is publicly readable by all and has clear rules of procedure and appeal.

The ODF TC has three subcommittees. The Accessibility SC is co-chaired by IBM and Sun, while the Formula Subcommittee and the Metadata Subcommittee are each chaired by individual members of OASIS who are not affiliated with any large corporations.

Voting rights in the ODF TC, for accepting or rejecting features, is currently as follows:

  • Sun – 3 voting members
  • IBM – 4 voting members
  • Individuals – 3 voting members

This can easily be verified at the OASIS ODF TC website.

Is sharing the chair position on the TC and on 1 of 3 subcommittees considered “closely controlling”? Is having 30% of the votes considered “closely controlling”?

As for proposals being accepted into ODF, we note that all three major features for ODF 1.2, RDF metadata, OpenFormula, and enhanced accessibility, are new proposals which have not been yet implemented in OpenOffice. Moreover, the ODF TC is currently processing a set of features requested by the KOffice open source project. So the assertion that it is “essentially impossible” to get new features into ODF if they are not already supported by OpenOffice is not true. This error is unfortunate and needs correcting through rigorous fact checking, as do the others, in our opinion.

Oddly enough, this particular error occurs in several places. A search of the report for the word “control” shows it used six times, once in reference to “Chinese communists” and five times in reference to Sun Microsystems. Note, however, that no mention is ever made of the strong direct control Microsoft asserts over OOXML, its having sole chairmanship of the Ecma TC45, and its having secured a committee charter that prevents any changes to OOXML that are not compatible with Microsoft Office.

Again, we're puzzled by the inaccuracy on one hand and the lack of balance on the other.

Now, back to the Burton Group, where Guy Creese responds on the Burton Group blog:

We were not expecting to be told that Sun had significant sway over the standard, but several people told us that (spread across more than one ODF-oriented vendor), which is why we noted it in the report. As the ODF Alliance notes, IBM and Sun—two of Microsoft’s most powerful productivity application archrivals today (as well as partners to Microsoft in myriad other domains, e.g., Web services-related standards initiatives)—collectively control 70% of the votes in the ODF TC which determines if proposals will be accepted or rejected. This suggests there is ample opportunity for conflicts of interest.

Guy, excuse me, did you say "conflicts of interest"? Please explain. Or maybe when Peter O'Kelly comes back from speaking at Microsoft's Office Developers Conference he can explain it for us?

In any case, the factual errors in your report with respect to the control of ODF have been clearly demonstrated, but instead of simply admitting and correcting the error, you hide beyond anonymous sources and further impugn OASIS by charging some sort of "conflict of interest".

To follow your logic further demonstrates the absurdity of it. If you believe that the fact that IBM and Sun "collectively control 70% of the votes in the ODF TC" lends weight to your argument, then what is shown by the equally true mathematical fact that IBM plus independent members also control 70% of the votes? Why is this equally true fact not mentioned? This is the nature of plurality, that there are many different combinations of votes that could make a majority position. Further, note that these groups in practice do not always vote as a bloc. We've had votes where the independent members split their vote, and we even had a vote where the IBM members did not all vote alike. So much for your simplistic control theory.

I will not question whether your anonymous sources indeed misled you. For sake of argument, I will accept unquestioningly that you indeed had sources and that they said exactly what you claim they said. However, having sources does not excuse you, as an analyst, from doing basic fact checking. The rules of OASIS and the voting composition of the ODF TC are facts, not opinions, and the correct information was sitting there, on public web sites, for you to check. It is not your fault that you were misled by sources, but it is your fault that you did not verify their claims. To publish controversial statements based on anonymous sources without fact checking, this is not something that represents the Burton Group's finest work.

The Burton Group has denigrated the work and the members of the OASIS Open Document Format Technical Committee (of which I am Co-Chair) with published statements that have been shown to be false. The Burton Group owes us an apology and an immediate retraction.

Waiting until after February, after the DIS 29500 process concludes, to make corrections is unacceptable. Since your stated purpose in making this report public was to "advance the debate" in the current OOXML ISO process, withholding factual corrections until after that process concludes would imply that you and the Burton Group see no problems with knowingly persisting in influencing an ISO ballot with false information published under the Burton Group name. I don't believe that is the image that the Burton Group would want to project. So I urge that a correction is in order now.

Labels: ,

Thursday, January 31, 2008

The Case for Harmonization

Depending on who you ask, document standard harmonization is either impossible or inevitable, anathema or nirvana. Let's dig a little into this question and see if the two sides are really that far apart.

First note that many JTC1 NB's raised the issue of harmonization in their DIS 29500 ballot comments last September. Some merely requested harmonization, such as Korea, South Africa, Belgium, Peru, Switzerland, or the Czech Republic, while others in addition outlined ways to achieve harmonization. For example, AFNOR, the French NB stated:

After 5 months of extensive discussions between stakeholders in the field of revisable document formats, AFNOR, in the aim to obtain a single standard for XML office document formats within 3 years, makes the following proposal:
  • Split the current ECMA 376 standard in 2 parts in order to differentiate the essential OOXML core functions necessary for easy implementation from those functionalities that are needed for the exchange of legacy office file formats;
  • Incorporate the technical comments below and those in the attached comment table submitted to the Fast Track;
  • Attribute the status of Technical Specification to both parts;
  • Establish a process of convergence between ODF (already standardized as ISO/IEC 26300) and the above mentioned OOXML core. ISO/IEC shall invite parties involved to commit themselves to initiate simultaneously the revisions of the existing ODF v1.0 and the OOXML core in order to obtain at the end of the revision process a standard as universal as possible.
(Note that a Technical Specification, in ISO process, is for proposals which lack insufficient support for approval as an International Standard, but for which publication is still desired. This may be appropriate for OOXML.)

New Zealand's proposal was similar:
Further, the NB's of Great Britain, New Zealand, and the United States requested that specific features be added to OOXML in order to improve interoperability with ISO ODF, in total 40 features such as the ability:
Notably, these were the same features that Microsoft sponsored translator project on SourceForge identified as needed to improve translation with ODF. These are the features that the project noted were lacking in OOXML.

Ecma rejected every single one of these requests. They did not argue that the requested features were unreasonable. They did not argue that the requested feature was not needed. Their argument was that harmonization of the formats was not necessary because there exist tools that will translate between OOXML and ODF. In other words, they rejected these requests merely because they were pro-harmonization, regardless of the underlying merit or need of the feature. Ironically, Microsoft's conversion tools are restricted in their fidelity because of the lack of these very features.

On the question of harmonization, we are either moving toward it, or we are moving away. There is no time better than the present to harmonize. Waiting will only make matters worse, as we will then need to consider legacy OOXML documents as well as legacy binary and legacy ODF documents. The Ecma response does not move us toward harmonization, but starts down the road toward further divergence, a long and costly divergence.

Tim Bray made the critical observation back in 2005, “The world does not need two ways to say 'This paragraph is in 12-point Arial with 1.2em leading and ragged-right justification'.”

Microsoft likes to claim that harmonization is impossible, that slapping together the features of both standards would lead to a messy, impenetrable mess. Of course, but only an idiot would suggest that as an approach to harmonization. So why do they always bring that up as their strawman?

A look at OpenOffice and Microsoft Office shows a huge degree of functional overlap. Harmonization starts from looking at this functional overlap – and there is a significant, perhaps 90%+ area where they do overlap – and expresses the functional overlap identically, using the same xml schema. In other words, harmonization identifies the commonalities at the functional level and finds a common representation for that commonality.

It would also be expected that the common functionality between ODF and OOXML would also include a common extensibility mechanism, a way for a vendor to express application-specific features that are outside of the harmonized standard.

The remaining 10% of the functionality would be the focus of the harmonization work, the area that requires the most attention. Some portion of that 10% will represent general-purpose features that we can imagine multiple application supporting. We take those features and add them to ODF. That remaining portion of the 10%, which only serves one vendor's needs, such as flags for deprecated legacy formatting options, could be represented using the common extensibility mechanism.

Does this sound impossible? That's not what Microsoft says. Gray Knowlton, Group Product Manager for Microsoft Office, was candid to PC World a couple of weeks ago:

Also, if individual governments mandate the use of ODF instead of Open XML, Microsoft would adapt, Knowlton said. The company would then implement the missing functionality that ODF doesn't support. However, those extensions would be custom-designed and outside of the standard, which is counter to the idea of an open document standard, Knowlton said.

So we've agreed that this approach is technically feasible. We're also agreed that extending ODF outside of the standards process is not a good idea. So the obvious solution is to extend ODF within the standards process. So, let's do it! What are we waiting for?

There is no reason why, by a harmonization process, all of the functionality of Microsoft Office cannot be represented on a base of ISO 26300 OpenDocument Format. I personally, as Co-Chair of the OASIS ODF TC, stand ready and willing to sponsor such a harmonization effort in OASIS. So let's start harmonization now, and avoid further divergence.

My read of NB comments indicates that there is a sizable bloc, perhaps even a decisive bloc, of NB's who are in favor of harmonization. Lets push on this and articulate a roadmap along the lines of the proposals by France and New Zealand, that accomplishes this.

Labels: ,

Wednesday, November 21, 2007

PDF, The Waste Land, and Monica's Blue Dress

Adobe's PDF Architect, James King, has recently started an "Inside PDF" blog which is well worth subscribing to. I'd especially draw your attention to his post "Submission of PDF to ISO" which has a lot of useful information on the process they are going through in ISO, a process that is slightly different than that used by ODF or OOXML in JTC1. (Note in particular that ISO Fast Track is not exactly the same as JTC1 Fast Track.)

In a more recent post, Archiving Documents, James wonders aloud why anyone would use ODF or OOXML for archiving, compared to PDF or PDF/A, saying "After all, archiving means preserving things, and usually you want to preserver the total look of a document. PDF/A does that."

I recommend reading the Archiving Documents post in full, and then return here for an alternate point of view.

.
.
.

We say the word "archive" quite easily and cover a large number of activities by that name, and in doing so risk blurring a number of different activities into one over-generalization. Before you are told that format X or format Y is best for archiving it is fair to ask what I mean by "archiving" and ask who does the archiving, for what purpose and under what constraints.

In some cases what must be preserved, and for how long, is spelled out in detail for you, by statute, regulation or court order. Or, a company, in anticipation of such requests may require preservation as part of a corporate-wide records retention policy for certain categories of employees or certain categories of documents.

An example of the range of materials that may be included can be seen this this preservation order:

"Documents, data, and tangible things" is to be interpreted broadly to include writings; records; files; correspondence; reports; memoranda; calendars; diaries; minutes; electronic messages; voicemail; E-mail; telephone message records or logs; computer and network activity logs; hard drives; backup data; removable computer storage media such as tapes, disks, and cards; printouts; document image files; Web pages; databases; spreadsheets; software; books; ledgers; journals; orders; invoices; bills; vouchers; checks; statements; worksheets; summaries; compilations; computations; charts; diagrams; graphic presentations; drawings; films; charts; digital or chemical process photographs; video; phonographic tape; or digital recordings or transcripts thereof; drafts; jottings; and notes. Information that serves to identify, locate, or link such material, such as file inventories, file folders, indices, and metadata, is also included in this definition.
--Pueblo of Laguna v. U.S. // 60 Fed. Cl. 133 (Fed. Cir. 2004).

I would pay particular attention to the part at the end, "...drafts; jottings; and notes. Information that serves to identify, locate, or link such material, such as file inventories, file folders, indices, and metadata".

Similarly, consider government and academic archives, that are preserving documents for the long term. The archivist tries to anticipate what questions future researchers will have, and then tries to preserve the document in such a way that it can best answer those questions.

A PDF version of a document answers a single question, and answers it quite well: "What did this document look like when printed?" But this is not the only question that one might have of a document. Some other questions that may be asked include:

  1. What was the nature of collaboration that lead to this document? How many people worked on it? Who contributed what?
  2. How did the document evolve from revision to revision?
  3. In the case of a spreadsheet, what was the underlying model and assumptions? In other words, what are the formulas behind the cells?
  4. In the case of a presentation, how did the document interact with embedded media such as audio, animation, video?
  5. How was technology used to create this document? In what way did the technology help or impede the author's expression? (Note that researchers in the future may be as interested in the technology behind the document as the contents of the document itself.)
The PDF answers one question -- what does the document look like -- but doesn't help with the other questions. But these other, richer questions, will be the ones that may most interest historians.

Let's take a analogous case. T.S. Eliot's 1922 poem The Waste Land is a landmark of 20th century literature. Not only is it important from an artistic and critical perspective, but it is also important from a technology perspective -- it is perhaps the first major poem to have been composed at the typewriter. What was published was, like a PDF, what the author intended, what he wanted the world to see. That is all the world knew until around 1970, after the poet's death, when the rest of the story emerged in the form of typewritten draft versions of the poem, with handwritten comments by Ezra Pound.







This provided pages and pages of marked up text that showed the nature and degree of the collaboration between Eliot and Pound far more than had been previously known. This is what researchers want to read. The final publication is great, but the working copy tells us so much more about the process. History is so much more than asking "What?". It continues by asking "How?" and eventually asking "Why?" -- this is where the real insight occurs, going beyond the mere collection of facts and moving on to interpretation. PDF answers the "What?" question admirably. I'm glad we have PDF as a tool for this purpose. But we need to make sure that when archiving documents we allow future research to ask and receive answers to the other questions as well.

Flash forward to the technology of today. We're not all writing great poetry, but we are collaborating on authoring and reviewing and commenting on documents. But instead of doing it via handwritten notes, we're doing it via review & comment features of our word processors. Although the final resulting document may be easily exportable as a PDF document, that is really just a snapshot of what the document looks like today. It loses the record of the collaboration. I don't think that is what we want to archive, or at least not exclusively. If you archive PDF, then you've lost the collaborative record.

Another example, take a spreadsheet. You have cells with formulas and these formulas calculate results which are then displayed. When you make a PDF version of the spreadsheet you have a record of what it "looked like", but this isn't the same as "what it is". You cannot look at the formulas in the PDF. They don't exist. Future researchers may want to check your spreadsheeet's assumptions, the underlying model. There may also be the question of whether your spreadsheet had errors, whether from a mis-copied formula, or from an underlying bug in the application. If you archive exclusively as PDF, no one will ever be able to answer these questions.

One more example, going back to 1998 and the Clinton/Lewinsky scandal. Kenneth Starr's report on the case was written in WordPerfect format, distributed to the House, which converted it to HTML form and released it on the web. But due to a glitch in the HTML translation process, footnotes that had been marked as deleted in the WordPerfect file reappeared in the HTML version. So we ended up with an official published Starr Report, as well as an unofficial HTML version which had additional footnotes.

Imagine you are an archivist responsible for the Starr Report. What do you do? Which version(s) do you preserve? Is your job to record the official version, as-published? Or is your job to preserve the record for future researchers? Depending on your job description, this might have a clear-cut answer. But if I were a future historian, I would sure hope that someone someplace had the foresight to archive the original WordPerfect version. It answers more questions than the published version does.

So, to sum it up: What you archive determines what questions you can later ask of a document. If you archive as PDF, you have a high-fidelity version of what the final document looked like. This can answer many, but not all, questions. But for the fullest flexibility in what information you can later extract from the document, you really have no choice but to archive the document in its original authoring format.

An intriguing idea is whether we can have it both ways. Suppose you are in an ODF editor and you have a "Save for archiving..." option that would save your ODF document as normal, but also generate a PDF version of it and store it in the zip archive along with ODF's XML streams. Then digitally sign the archive along with a time stamp to make it tamper-proof. You would need to define some additional access conventions, but you could end up with a single document that could be loaded in an ODF editor (in read-only mode) to allow examination of the details of spreadsheet formulas, etc., as well as loaded in a PDF reader to show exactly how it was formated.

Labels: , ,

Sunday, November 18, 2007

Document Format FUD: A Guide for the Perplexed

I've decided to put together a list of misconceptions that I hear, generally on the topic of document formats. I'll try to update this list to keep it current, with the most recent entries at the top. Readers are invited to submit the FUD they observe as comments, and I'll include it where I can.

This inaugural edition is dedicated to the fallout from the recent supernova we know as the OpenDocument Foundation, that in one final act of self-immolation swelled from obscurity to overwhelming brilliance, but then slowly faded away, ever fainter and more erratic, little more than hot gas, the dimming embers no longer sustainable.


Q: Now that the originator and primary supporter of OpenDocument Format has ended its support for ODF, does this mean the end for the ODF standard? (18 Nov 2007)

A: This question is based on a mistaken premise, namely that the OpenDocument Foundation was the originator or steward of the ODF standard. This is an erroneous notion.

The ODF standard is owned by the OASIS standards consortium, with over 600 member organizations and individual members. The committee in OASIS that that does the technical working of maintaining the ODF standard is called the OpenDocument TC. It has 15 organization members as well as 7 individual members. Until recently the OpenDocument Foundation was a member of the ODF TC, one voice among many.

The adoption of the ODF standard is promoted by several organizations, most prominently the ODF Alliance (with over 400 organizational members in 52 countries), the OpenDocument Fellowship (around 100 individual members) and the OpenDoc Society (a new group with a Northern European focus, with around 50 organizational members). To put this in perspective, the OpenDocument Foundation, before it changed its mission and dissolved, had only 3 members.


When you consider the range of ODF adoption, especially in Europe and Asia, the strong continuing work on ODF 1.2 in OASIS, and the strong corporate, government and organizational participation demonstrated in the global ODF User Workshop recently held in Berlin, we seem to be making a disproportionate amount of noise over the hysterics of the disintegrating 3-person OpenDocument Foundation.

A number of analysts/journalists/bloggers didn't check their facts and seem to have fallen into the trap, and ascribed a far greater importance to the actions of the Foundation. Curiously, these articles all quoted the same Microsoft Director of Corporate Standards. I hope this correlation does not prove to be a persistent contrary indicator for accuracy in future file format stories.

Luckily for us, David Berlind over at ZDNet has penetrated the confusion and gets it right:

...the future of the OpenDocument Foundation has nothing to do with the future of the OpenDocument Format. In other words, any indication by anybody that the OpenDocument Format has been vacated by its supporters is pure FUD.

11/27/2009 Update: Berlind did further research and interviews on this topic and followed up with a podcast and new blog post OpenDocument Format Community steadfast despite theatrics of now impotent ‘Foundation’ on this subject.

Q: The Open Document Foundation has a document, a "Universal Interoperability Framework" that on its title page says "Submitted to the OASIS Office Technical Committee by The OpenDocument Foundation October 16, 2007". What is the status of this proposal in the ODF TC? (18 Nov 2007)


A: No such document has been submitted to the OASIS TC, on this date or any other date. OASIS policy states that "Contributions, as defined in the OASIS IPR Policy, shall be made by sending to the TC's general email list either the contribution, or a notice that the contribution has been delivered to the TC’s document repository". A look at the ODF TC's list archive for October shows that there was no such contribution.

Q: The Foundation claims that the W3C's CDF format has better interoperability with MS Office than ODF has. Is this true? (18 Nov 2007)

A: The Foundation's claims have not been demonstrated, or even competently argued at a technical level that would allow expert evaluation. I cannot fully critique what is essentially vaporware. However, those who know CDF better than I do have commented on the mismatch between CDF and office documents, for example the recent interview with the W3C's Chris Lilley in Andy Updegrove's blog.

Q: So, does IBM then oppose CDF in favor of ODF? (18 Nov 2007)

A: No. IBM supports both the development of ODF and CDF and has a leadership role in both working groups. These are two good standards for two different things.

The W3C, over the years has produced a number of reusable, modular core standards for things like vector graphics (SVG), mathematical notation (MathML), forms (XForms), etc. To use a cooking analogy, these are like ingredients that can be combined to make a dish. ODF has taken a number of W3C standards and combined them to make a format for expressing conventional office documents, the familiar word processor, spreadsheet and presentation documents. ODF is an OASIS and ISO standard.

But just as eggs, butter and flour form the base of many recipes, the core W3C standards can be assembled in different ways for different purposes. This is a good thing.

CDF is not so much a final dish, but an intermediate step, like a roux (flour + butter) is when making a sauce. You don't use a roux directly, but build upon it, e.g., add mik to make a béchamel, add cheese for a cheese sauce, etc., CDF itself s not directly consumable. You need to add a WICD profile, something like WICD Mobile 1.0, before you have something a user agent can process.

Labels: ,

Friday, October 12, 2007

ODF enters the Semantic Web

Metadata is "data about data". Meta from the Greek, μετά, meaning with or after. I suppose if you wanted to sound grand you could pronounce it hyper-correctly with the stress on the second syllable, met-ah'. I've heard some incorrectly pronounce it meet'-ah, perhaps a false analogy with βῆτα = beta. But you never hear anyone pronounce μέγα = mega as mee-guh, do you?

Metadata is not new. It has been around for centuries. In some cases metadata applies to the overall document, while in other cases it applies to only a portion of the content. Examples of the first case include titles of books, footnotes, ISBN numbers, LOC or Dewey Decimal categorizations, keywords, etc. The various forms of scribal marginalia, whether scholia or glosses in the margins of a manuscript, or personal annotations of the owner of a document, are historic examples of the second kind of metadata.

Marginal notes are frequently used today in business forms. A printed form represents, often imperfectly, a snapshot in time of an organization's view of their own process. But maybe the process was was approximated or the form was imperfectly designed, maybe it quickly became outdated, but somehow reality seems to outgrow the strictures of a form's blanks and checkboxes. So what do, as a user, do? You write notes in the margins or other places between form fields and hope that there is a human in the loop someplace to read your words.

In any case, of all documents, forms (originally called "formulary documents") have the most structured representation of data. Enter your social security number into the nine little boxes provided. Enter your date of birth here, Month first, then day, then two-digit year. Last name first, first name last. Everything is nice and simple, and provided your reality matches that which the form designer envisioned, your data will be easy to consume, whether by another person or, after data entry, by various online processes. Or maybe the form was entered online originally? Even better.

But what about all the other documents in the world, the ones that are not formally structured as forms? What sense can we make of them? Can you tell a social security number in a free-form document, or a date, or a zip code? Perhaps with pattern matching, you can find out some simple things. That is the essence of Microsoft's Smart Tags. (And we had much of this in Lotus Agenda a decade earlier.) But this only works for the most trivial cases. It only takes you so far.

What if I wanted to markup an academic paper, a work-in-progress, to indicate which quotations have been verified and which ones remain to be be verified? Or what if I want to annotate statements in recorded testimony according to which statements contradict and which corroborate another witness's statements? This goes far beyond pattern matching. I need a way to encode my knowledge, my view of the subject, in the document.

We have data in a document -- "Words,words, words" as Hamlet tells Polonius. But for those who work with thoughts, the present constraints of encoding our knowledge as rudimentary linear strings of characters is severe. In general text is multi-layered and hyper-linked in strange and marvelous ways. Your father's word processor and word processor format are inadequate to the task. The concept of a document as being a single store of data that lives in a single place, entire, self-contained and complete is nearing an end. A document is a stream, a thread in space and time, connected to other documents, containing other documents, contained in other documents, in multiple layers of meaning and in multiple dimensions. What we call a traditional document is really just a snapshot in time and space, a projection into print-ready output form of what documents will soon become.

The applications of metadata to business documents are legion. Wherever you have data, you also have the questions of:

  1. Who entered the data?
  2. Where did the data come from?
  3. Who verified the data?
  4. Who approved the data? Legal? HR? Business?
  5. Where is this data destined?
  6. How old is the data? When does it expire?
  7. How trustworthy is this data?
  8. Who must we cite as an authority for this data?
  9. Who owns this data?
  10. Who has permissions to see this data?
  11. Who can set policy for this data?
  12. Who else can edit this data?
  13. How does this data connect with my business? Is it a part number? The name of a customer or the name of an employee?
And so on.

OpenDocument Format (ODF) 1.2 will be taking a step into the word of structured metadata with an RDF/XML metadata framework. If that sounds Greek to you, then let's say that a metadata framework enables application developers to create applications that do the above things. A framework doesn't tell you how you must say "This image is provided under a Creative Commons Share-Alike license" but provides a framework for application developers to express concepts like "licensed-under" and "Create Commons Share-Alike", as well a formal structure for expressing subject-predicate-object relationships, where the subject can be any of around 50 ODF document elements, such as paragraphs, footnotes, images, tables, etc.


To read more, here are some places to start:

For general background on the "semantic web", a good intro is 2001 Scientific American article "The Semantic Web" by Tim Berners-Lee, et. al.

For a bit more on RDF, the wikipedia page is pretty good.

Svante Schubert at Sun, also on the ODF Metadata Subcommittee has a recent blog post worth reading: "New Extensible Metadata Support With ODF 1.2.

Bruce D'Arcus, of the Metadata Subcommittee and co-lead of the OpenOffice.org Bibliographic Project also contributes his thoughts on the new ODF 1.2 metadata.

If you want to delve into the particulars of ODF 1.2's new metadata support, you can read the latest draft of the proposed changes to the specification [ODF] and the examples [ODF] document. Of course, any feedback on ODF drafts and published standards are welcome on the ODF TC's comment mailing list.

For a gentle introduction to metadata, ODF, where we are coming from and where we are going, I offer this interview [MP3] with Patrick Durusau, Chair of the ODF Metadata Subcommittee, which I recorded back in July.

Labels: ,

Sunday, October 07, 2007

Cracks in the Foundation

You must admire their tenacity. Gary Edwards and the pseudonymous "Marbux". The mythology of Silicon Valley is filled with stories of two guys and a garage founding great enterprises. And here we have two guys, and through blogs, interviews, and constant attendance at conferences, they have become some of the most-heard voices on ODF. Maybe it is partly due to the power of the name? The "OpenDocument Foundation" sounds so official. Although it has no official role in the ODF standard, this name opens doors. The ODF Alliance , the ODF Fellowship, the OASIS ODF TC, ODF Adoption TC (and many other groups without "ODF" in their name) have done far more to promote and improve ODF, yet the OpenDocument Foundation, Inc. seems to score the panel invites. Not bad for two guys without a garage.

However, in recent months the OpenDocument Foundation has found itself more and more isolated, outside of the mainstream debate. How far they have fallen can be seen in the fact that Microsoft has gone from ridiculing their conspiracy theories to using them to support their arguments. At the same time the Foundation's membership has dwindled to the point where only a small number remain. Former members have disassociated themselves from the Foundation as it turned increasingly to strident rhetoric. Whereas in the early days, the Foundation had a large membership that participated fully in the OASIS TC's, now their "contributions" are mainly that of heckling and haranguing the other members. Finally, the Foundation has recently announced its intent to abandon constructive work within OASIS, to actively lobby against adoption of ODF 1.2 in ISO and to push for an alternative format, CDF, based on XHTML, CSS 3.0 and RDF. This is an odd stance for a non-profit whose charter was:

The OpenDocument Foundation, Inc. is a 501c(3) non profit chartered to work in the public interest to support, promote and develop the OASIS OpenDocument File Format affectionately known as "ODf".

So it is against this backdrop that I read with interest in Linux Today the latest correspondence from the Foundation. You can read it yourself, or take the following 8 points from me as a condensed summary of their main points:
  1. "The commercialization of interoperability remains a key driver in both big vendor deals and big vendor consortia FOSS is left on the outside looking in."

  2. The conversion to XML [document formats] must be nondisruptive" meaning it fits into existing business processes which are increasingly dominated by Microsoft middleware. This implies a requirement for high-fidelity, loss-less round-trip conversions.

  3. The alternative is "rip and replace" and that is too costly and disruptive.

  4. Microsoft is moving toward a "grand convergence" of their services, desktop, device and servers, with OOXML at the core. "MS-OOXML is the primary transport, the document/data container of interop-integration preference."

  5. ODF was not designed as a response to these problems.

  6. Microsoft/Sun/Novell are working "to limit ODF interoperability and usefulness" because of some patent deals. (Sorry I can't summarize this one better -- I just don't understand it.)

  7. IBM/Oracle/Google are working to "limit ODF interop" because "they want a total ripout and replace of MS Office."

  8. The Open Document Foundation is in "the middle area of trying to perfect the conversion to XML".


Let me take these points one-by-one:


  1. The OpenDocument Foundation seems to try to clothe themselves in the mantle of the open source community and pontificate on how the big bad vendors treat interoperability. But are they speaking as a non-profit or as a vendor? Take their DaVinci plugin, for example. Where is the source code? Why isn't this open source? Are we to follow the Foundation's claim of 100% interoperability, based on blind faith, without seeing some proof in the form of working code? I've been working on document conversions and document file formats of one kind or another for almost 20 years. I've never seen 100% fidelity conversions of anything but trivial formats. Extraordinary claims require extraordinary evidence. But we have nothing here, just white papers.

  2. I would not claim a priori that all customers require lossless, 100% fidelity conversions. Remember, we do not see 100% fidelity even when upgrading from Office 2003 to Office 2007, but this appears to be adequate. What is required is that the total return from changing document formats exceeds any other profitable use of capital available to the enterprise. In other words, to a business this is an investment, and will be judged as an investment. Very few businesses will take a dogmatic, ideologically pure view of this. Ask yourself, would you accept 1% loss in fidelity if I gave you a billion dollars? Yes,of course you would. There are no purists in business who will remain in business. We're just haggling over what price/fidelity combination is needed to make a good investment.

    However, there is a notable exception to this rule, and that is where access to open document formats are mandated as a public right, not as a business investment. Think of the last 20 years or so of enabling public buildings with ramps for the disabled, bathrooms to accommodate wheelchairs, braille lettering in elevators. This was done by legislation and regulation, as a matter of public policy, to ensure that all of the public has access to public facilities. There was no requirement that an access ramp post a net profit. Similarly, today we see some movements to ODF are based on open access principles.

  3. This is what we call the "fallacy of the excluded middle." You are either with us, or against us, etc. It is false to suggest that the only two approaches to interoperability are to either blindly follow the OpenDocument Foundation's mysterious DaVinci plugin, or to ignore interoperability altogether and advocate rip and replace. There are today two other other ODF plugins available, one from Microsoft and one from Sun. This is real, running code, open source even in the case of the first plugin. So why should we be taking exclusive direction from the Foundation on how we achieve interoperability? Oh right, they are claiming that their program achieves 100% round-trip fidelity. Extraordinary claims...

  4. Gary is in the ballpark when he suspects that Microsoft is seeking some sort of "grand convergence" around protocols and formats. However, I disagree with his impression that OOXML sits at the center of this. In my opinion, OOXML is a rushed, transitional format, intended purely to disrupt ODF adoption. Just as the Office 2000, Office XP, and Office 2003 markup formats were abandoned by Microsoft, I predict that OOXML will soon be cast aside. The problem is that OOXML is such a poorly-engineered format that not even Microsoft wants to build upon this. If I had to divine the future of Microsoft's file formats, I'd look more in the XAML/XPS/Silverlight space. I believe that future MS Office document formats will look more like that than like OOXML.

  5. I find this observation amusing. ODF, which started its standards track late in 2002, was not designed to be 100% compatible with Office 2007. Mercy me, how did we manage to drop the ball on this one?! Remember, in 2002 there was no publicly available specification for Microsoft document formats. There was no Open Specification Promise or Covenant Not to Sue. So not only was 100% compatibility technically impossible, attempting it via reverse engineering was precarious from a legal standpoint. In my opinion, it still is, even in 2007.

    In any case I'm staunchly opposed to evolving any open standard purely for the benefit of a single vendor. Microsoft Internet Explorer is the dominate web browser. Should we then require that HTML only evolve in ways that improve interoperability with Internet Explorer? I don't think so. Why should document formats be different?

  6. This comment manages to avoid confronting a heap of contrary facts. Microsoft supports the open source ODF Translator project on SourceForge. Sun has made their own ODF Plugin 1.1 for MS Office available for download. And Novell, along with helping the Microsoft effort, has integrated that translator into their version of OpenOffice and has also started work on more powerful, next-generation support for OOXML. So these three companies are seeking to "limit ODF interoperability and usefulness"? If so, they sure have a clever way of disguising their intent. To the ordinary bystander, writing conversion and translation code to allow documents to be shared between OpenOffice and MS Office would be seen as a pro-interoperability statement. But thanks to the OpenDocument Foundation's in-depth sleuthing, we now know that the opposite is true. Not!

    Although I have serious doubts as to long-term technical feasibility of some of these translation endeavors, they do have the advantage of showing real, running code working with real, running applications. They may not claim 100% fidelity, but this is first-generation work and will undoubtedly improve. But they have an important advantage over the Foundation's DaVinci Plugin in that these other efforts demonstrably exist. Given a choice, I'll take an open source version of a partial fidelity convertor, with a reasonable architecture, over one that claims 100% fidelity, but that I can't see or touch.

  7. The claim is that IBM/Google/Oracle also want to "limit ODF interop" because (according to Gary) we want rip & replace. Strange, but just a few weeks ago I lead an ODF Interoperability Camp in Barcelona, on behalf of the OASIS ODF Adoption TC, where we had a good selection of ODF vendors, open source projects and customers working to improve interoperability, including Sun, Novell, Google and IBM. The OpenDocument Foundation is a member of the OASIS ODF Adoption TC. So did they help in the organizing of the event? Did they participate? No, nothing, nada. Evidently it is easier to complain about interoperability than to do something about it.

    And again there is this fallacy of the excluded middle. You must either accept the magical DaVinci Plugin, or you are for rip & replace. There are no other alternatives considered. I'd remind the OpenDocument Foundation that interoperability was not invented yesterday, and that there are many technical approaches that can be applied to foster it. Open standards are one way, but there are others that can be applied as well, including conformance testing, test suites, plug-fests, profiles, shared code, reference implementations, etc. We should apply experience and engineering judgment to select the appropriate solution for the problem, and not fall into the trap of believing that there is only a single path to interoperability, and that this path just happens to be based on the Foundation's product.

  8. Although it sure would be nice to portray yourself as the little guy, watching out for the customer, while the big bad vendors tromp all over the flowers, the fact is that the big vendors are actively working on interoperability, with at least three major solutions available today, as well a major initiative around interoperability in the ODF Adoption TC. In particular, IBM (with SmartSuite) and Sun (with StarOffice) have 15 or so years experience each in working on document interoperability with MS Office. This isn't rocket science, but neither is it easy. You can either stand on the sidelines and make pronouncements about how the world is out to prevent interoperability, or you can roll up your sleeves and help get the work done. I know which one I'll be doing. What about you?

If the Foundation's approach was technically feasible, they would just go out and do it. You don't let a breakthrough technical innovation wait on a standards committee to act. You just go out and do it and then standardize it later, once you've proven it works. If the Foundation really thinks that they can achieve 100% interoperability with MS Office with just 5 simple changes to ODF, then why the heck don't they just do it? Don't wait for the formality of an the ODF TC 's approval. They should go ahead, as if the standard already had their 5 fixes, and show the world how they have achieved 100% interoperability with MS Office. If they are right, they would all become multi-millionaires in a very short period of time.

Labels: , ,

Monday, September 24, 2007

OpenOffice.org Conference 2007

I'm back from Barcelona despite Delta's best efforts to trap me at JFK airport. No rain, no snow, no sleet, no security alert, no strike. Nothing. But somehow Delta managed to turn a scheduled 40 minute flight to Boston into a 3 hour delay to board plus another 2.5 hours sitting on the runway waiting to take off. So instead of arriving at 18:00, we didn't arrive in Boston until 23:30.

It is interesting to look at FlightStats.com to see how they rate this particular flight. It says that DL 480 has an on-time percentage of 30%, and is excessively late 52% of the time. The average delay for this flight is 79 minutes.

I just don't get it. It is one thing to be slow. But why can't you be slow and still be accurate in your estimates? If you are going to be 79 minutes late on average, then why don't you adjust your schedules accordingly?

In any case, the conference in Barcelona was great! This was my 2nd year attending OOoCon. Last year, in Lyon, I attended OOoCon as an outsider. I remember then being asked by several attendees why IBM was not contributing code to the community and thinking to myself how much it sucked that we were not doing so. What a difference a year makes! Now the discussion is not if IBM will contribute, but the logistics of exactly when and how we will make our contributions. I was proud to attend the Barcelona conference as a real OpenOffice.org member, and I can tell you that the beer tastes better when you are a member of the community.

I gave a presentation called "ODF Interoperability: The Price of Success" on Wednesday. The slides should be posted up here within a few days. A video of the presentation is here. Your best bet is to wait for the slides and follow along with my audio.

On Thursday I lead a full-day workshop on ODF interoperability on behalf of the OASIS ODF Adoption TC. We had participants from a number of ODF vendors/projects: IBM, Sun, Google, Novell, SEPT-Solutions, Haansoft, OpenOffice.org and KOffice. We worked through a few exercises where we tested the exchange of documents that reflected a number of typical real-world business cases. Although they did not attend, we also did some tests with the Clever Age Word Add-in. This event was the first of hopefully several workshops where we will attempt to bring the vendors together in a focused effort to improve ODF interoperability.

There were many good conference sessions that I wanted to attend but missed. That is the downside of having a full day workshop. Of the sessions I did see, the highlights were:


For the ones I missed, I need to go back and watch the taped sessions and read the presentations.

Overall, it was great to see old friends, and meet so many more for the first time, including some with whom I have corresponded with at length, but never before had met in person.

I didn't have much time to play a tourist, so I'll give you only two pictures. The first I've taken from the Ars Aperta website, a picture of Charles Schulz and I exchanging funny stories at the Mac Porting party:



And in the "Maybe My Youth Was Not Misspent" Department comes this picture of a decorative "column" outside the building where I gave my presentation on Wednesday. The building hosts the University of Barcelona's philology department. I immediately recognized the text as Homer and snapped this photo. The next day I was passing when two students were trying to read it. I stopped, and stood, with arms dramatically outstretched, and in my best Greek dactylic hexameter, recited from memory the Invocation to the Muse that begins the Iliad. So, thank you Professor Higbie, wherever you are, for making us memorize Homer. It actually came in use!

Labels: ,

Thursday, August 02, 2007

An Invitation: ODF Interoperability Workshop

The OASIS ODF Adoption TC is organizing an ODF Camp to be held on September 20th in Barcelona, Spain. Facilities for this event are graciously provided by OpenOffice.org, which will be holding its annual conference concurrently.

The hope is that this will be the first of several such events to bring ODF vendors together to explore ways of greater technical coordination, especially in the area of interoperability. I've written about and presented on this topic before. Now is the time for action, and I'm extremely pleased that so many vendors will be attending.

On other occasions I've called interoperability "the price of success" because a standard implemented by only a single vendor and a single application need not worry about it. Only successful standards with many implementations need to rent a hall to bring the implementors together to review and perfect interoperability.

(It is like capital gains taxes. I grumble when I pay them, but take some solace in the fact that my investments were profitable. Those who make a losing investment don't pay capital gains taxes on it.)

The focus of this first interoperability event will be on the ODF word processor format. Follow-up events will look at spreadsheets and presentations.

Please have a look at the detailed agenda for the camp and consider joining us in Barcelona.

Labels: ,

Sunday, July 29, 2007

My comments on the ETRM 4.0 draft

This was my response to the call for public comments on the Information Technology Division's (ITD) Enterprise Technical Reference Model (ETRM) 4.0 draft.




I’d like to write to you as a long-time Massachusetts resident and taxpayer. My employer (IBM) will likely submit their own comments, but I’d like to offer you my own personal views on the ETRM 4.0 draft.

I am proud of the Commonwealth’s tradition of openness in government, enshrined in our Public Records Law and Open Meeting Law. As James Madison wrote, “A popular government, without popular information, or the means of acquiring it, is but a prologue to a farce or a tragedy. A people who mean to be their own governors must arm themselves with the power which knowledge gives them.” So access to government documents, now and for posterity, is critical for public oversight and participation in government, as well as for preserving our heritage. Now that we’ve moved into the digital age, access to government documents requires that these documents be made available in a format that all Commonwealth residents can read. So the move toward open documents formats, as called for in the ETRM, is laudable. A citizen must never be dependent on any single vendor for the software needed to read their government’s documents.

However, I am concerned at the proposed addition of Ecma Office Open XML (OOXML) to the list of acceptable document formats. As you may have heard, OOXML is currently undergoing review by ISO/IEC JTC1 for possible approval as an ISO standard. As part of this review, technical committees in standards bodies around the world are reviewing OOXML and appraising it’s suitability as an International Standard. As a participant in the US committee reviewing OOXML, INCITS V1, I had the opportunity to review the text of the OOXML specification and to discuss it with others. I am sorry to report that I found the OOXML specification to be full of errors and omissions. Of course, no technical document is perfect. But this one, in particular, is of far greater length (more than 6,000 pages) and of far lower quality than any I have seen before. If it has advanced this far in the ISO process it is because of vendor pressure, not because of technical merit.

What is the problem with a buggy standard? Interoperability suffers. That is the problem. There is no doubt that if everyone in the Commonwealth used Microsoft Office 2007 on Windows Vista, that their interoperability will be good. But as soon as we admit choice in applications and operating systems, then interoperability will only occur when all sides follow a common standard. So the technical quality of a standard (accuracy, comprehensiveness, level of detail, consistency, etc.) is directly proportional to the level of interoperability achievable and the cost to achieve it.

The ISO ballot on OOXML will not end until September 2nd, after which a resolution process to fix defects in the text of the standard will take at least an additional 6-18 months. That is, of course, if OOXML gains ISO approval, something which is not certain at this point. So I would recommend a cautious approach, and wait for the ISO process to conclude, or conduct your own independent technical evaluation of the OOXML specification to confirm its technical quality before adding OOXML to your list. Ask other vendors: Is this something you can implement? Ask yourself: Will this truly give the Commonwealth the interoperability and choice that you desire? These are important questions to ask.

Finally, I’d note that the ETRM also calls out OpenDocument Format (ODF) as an acceptable format. ODF was approved by ISO last year. So why do we need OOXML? I personally think that the complexity of document exchange and translation in a multi-format world would take us back to the confusion and frustration of the early 1990’s when we all juggled WordStar, WordPerfect, Word and WordPro files, and could collaborate only poorly. Better to push for a single unified/harmonized standard document format for personal productivity applications, much as we have a single standard (HTML) for web pages.

I’ll leave you with a quote from Tim Berners-Lee, the inventor of the web, from an interview he gave with David Berlind from ZDNet when Berners-Lee was recently in Boston receiving a Lifetime Achievement Award from the Massachusetts Innovation & Technology Exchange.

Berners-Lee said:

It was the standardization around HTML that allowed the web to take off. It was not only the fact that it is standard, but the fact that it’s open and the fact that it is royalty-free.

So what we saw on top of the web was a huge diversity and different business which are built on top of the web given that it is an open platform.

If HTML had not been free, if it had been proprietary technology, then there would have been the business of actually selling HTML and the competing JTML, LTML, MTML products. Because we wouldn’t have had the open platform, we would have had competition for these various different browser platforms, but we wouldn't have had the web. We wouldn't have had everything growing on top of it.

So I think it very important that as we move on to new spaces ... we must keep the same openness we that had before. We must keep an open internet platform, keep the standards for the presentation languages common and royalty free. So that means, yes, we need standards, because the money, the excitement is not competing over the technology at that level. The excitement is in the businesses and the applications that you built on top of the web platform.



I believe we want to ensure the same qualities in document formats. We want competition and choice among vendors, applications and services, but not among standards. If we compete on standards, then no one wins.

Labels: , ,

Monday, July 09, 2007

The Formula for Failure

It has been a boast for around around 6 months now. Microsoft's OOXML fully defines spreadsheet formulas, and ODF doesn't. The Microsoft boosters have been parroting the party line for quite some time.

Miguel de Icaza gleefully noted back in January:

OOXML devotes 324 pages of the standard to document the formulas and functions.

The original submission to the ECMA TC45 working group did not have any of this information. Jody Goldberg and Michael Meeks that represented Novell at the TC45 requested the information and it eventually made it into the standards. I consider this a win, and I consider those 324 extra pages a win for everyone (almost half the size of the ODF standard).


And Microsoft's Jean Paoli quoted in May in InfoWorld:

As far as those 6,000 pages of specs is concerned, there are 350 pages in the OpenXML spec alone -- half of the entire ODF spec -- just to describe spreadsheet capabilities, which ODF doesn't have, Paoli says. For example, ODF can't describe or calculate a formula in a spreadsheet.

"It may sound amazing. They are working on it now. But the current standard doesn't have it," Paoli tells me.


There are many other examples, if you care to seek them out. But what you will not find is an examination of what OOXML actually specifies for spreadsheet formulas, or confirmation that it was done sufficiently. Maybe the assumption is that this would be a trivial task, documenting Excel's behavior? What could possibly go wrong?

Let's find out.

First, let's take the trigonometric functions, SIN (Part 4, Section 3.17.7.287), COS (Part 4, Section 3.17.7.50) and TAN (Part 4, Section 3.17.7.313). Hard to mess these up right? Well, what if you fail to state whether their arguments are angle expressed as radians or degrees? Whoops. Same problem for the return value of the inverse functions, ASIN (Part 4, Section 3.17.7.12), ACOS (Part 4, Section 3.17.7.4), ATAN (Part 4, Section 3.17.7.14), and ATAN2 (Part 4, Section 3.17.7.15). It is hard to have interoperable versions of these functions if the units are not specified. What kind of review in Ecma would miss something so simple?

The AVEDEV function (Part 4, Section 3.17.7.17) should return the average deviation of a list of values. However, the formula given for this function is actually for calculating the number of combinations of n things taken k at a time. Nice formula, though. Jakob Bernoulli would be proud. But anyone using an OOXML spreadsheet application that follows this standard will be perplexed at the values returned by their AVEDEV function. Did these formulas get any expert review in Ecma?

It is hard to have confidence in the CONFIDENCE function (Part 4,Section 3.17.7.47). It is said to return the confidence interval around a sample mean given an alpha value, a standard deviation and a sample size. The problem is that this problem is under-defined. One must make an assumption, not stated here, as to the shape of the data distribution. Is it normally distributed data? Exponentially distributed? Weibull distribution? The standard does not define the meaning of this function sufficiently for one to implement it.

The CONVERT function (Part 4, Section 3.17.7.48) converts from one unit to another. Some conversions explicitly allowed include liquid measure conversions such as from liters to cups or tablespoons. But whose cup and whose tablespoon? Traditional liquid measures vary from country to country. In the US, a cup is 8oz, except for FDA labeling purposes when a cup is 240ml. But in Australia a cup is 250ml and in the UK it is 285ml. Similarly a tablespoon has various definitions. OOXML is silent on what assumptions an application should make. I guess I won't be using OOXML to store my recipes, and certainly not to calculate medical doses!

Almost every one of the financial functions in OOXML depends on a "day count basis" flag, such as US (NASD) 30/360, Actual/Actual, Actual/360, Actual/365, European 30/360. These represent various conventions for how days and months are counted. The problem is that the OOXML standard does not define these conventions, nor does it point to an authority for their definition. There are subtle behaviors here, especially when dealing with leap years and Excel's deviant treatment of dates in the year 1900. So lack of detailed definitions in this area make it impossible for anyone to rely on identical financial calculations from different OOXML implementations. This, in a field where being off by a penny can cause problems.
Almost 30 spreadsheet functions are broken in this way.

(What do you call a scientist whose calculations are off by 50%? A cosmologist. What do you call an accountant whose calculations are off by 1%? A crook.)

The NETWORKDAYS function (Part 4, Section 3.17.7.344) seems simple enough. It returns the number of workdays (non weekend days) between two dates. Simple enough. Unless you live in the Middle East. The problem is that this function doesn't provide a facility for distinguishing the different weekend conventions. I may have a weekend on Saturday & Sunday, but a colleague in Tel-Aviv might have off Friday and Saturday, while in Cairo it might be Thursday and Friday. This function lacks the adaptability to deal with this important cultural difference. Saying that the definition of the weekend is implementation- or locale-dependent won't work either. I may be a French company in Paris dealing with contractors in Algeria. I need to have a French spreadsheet calculate schedules for workers at various locations and be able to exchange it with others offices using other OOXML applications and expect that they will get the same answer. Lacking cultural adaptability, OOXML fails approximately a billion people here.

Another example. Several of the statistical functions in OOXML are defined incorrectly. Take for example, the ZTEST function (Part 4, Section 3.17.7.352). The key error is following the formula where it says, "where x is the sample mean." The problem is that x-bar is the sample mean, not x. Someone who implements according to the text will give their users the wrong answer. A similar error is repeated in 8 other statistical functions. Certainly this is a typographical error, but this error changes the answer. Remember, this is an approved Ecma Standard and a proposed ISO Standard, not a 4th grade school essay. Denmark and Massachusetts have already said they will adopt OOXML for official business. Spelling counts. Providing the right formula and the right description counts. Copy and paste errors should have been taken care of back during the Ecma review.

I've submitted these spreadsheet formula issues, and many others, to INCITS V1, for consideration in determining the US position on the OOXML ISO ballot, but we never got to them during our two-day meeting in DC a couple of weeks ago, and may not get to them at all. There are simply too many other issues to read through and discuss. But I thought it was important to bring up these formula issues in particular, since Microsoft seems especially proud of their work in this area, delusions of adequacy which on reflection must now seem unwarranted. I'm especially concerned with the financial functions, since they are outside my area of expertise and may have additional errors that I missed.

So what is ODF doing about formulas? We're continuing to work on them. Rather than rush, we're doing careful, methodical work. We're documenting the functions in great detail. Where we have the choice between the common naive formula for a function and one that is numerically stable, we're documenting the stable function. For the NETWORKDAYS function, we created an optional extra parameter, so a user can pass in a flag that tells what their weekend conventions are. We have a professor of statistics reviewing our statistics functions for completeness and accuracy. We're verifying our assumptions about financial functions by referring to core specifications from groups like the ISDA and the NASD. We're creating a huge number of test cases and checking them with Excel and other applications.

Under Sarbanes-Oxley, a CEO or CFO puts himself at personal risk if he signs off on financial numbers derived from processes and tools that he knows to give erroneous results. So we utterly reject a rushed process that has lead to an Ecma Standard which incompletely and incorrectly defines spreadsheet functions. Some things are worth taking the time to do right.

As I've shown, in the rush to write a 6,000 page standard in less than a year, Ecma dropped the ball. OOXML's spreadsheet formula is worse than missing. It has incorrect formulas that, if implemented according to this standard, would raise important health, safety and environmental concerns, aside from the obvious financial risks of a spreadsheet that calculates incorrect results. This standard is seriously messed up. Shame on all those who praised and continue to praise the OOXML formula specification without actually reading it.

Labels: ,

Sunday, June 24, 2007

A File Format Timeline






26 June Update

I suppose the downside of a blog post containing only a picture is that there is nothing for anyone to quote. So here are a few themes that struck me while putting this chart together:

  1. Microsoft once made file format information on the binary formats readily available, in fact encouraged programmers to use the binary formats. But then around 1999 they reversed course, and eliminated such documentation. At the time, working at Lotus, I had no idea what motivated this change. It was only years later, when Microsoft internal memos were released in cases like Comes v. Microsoft, that the full picture emerged. The file format was viewed by Microsoft as a strategic tool, used to support the overall Microsoft platform, not the user. The format was designed to preserve their vendor lock-in. The availability of the file format documentation to competitors was limited, as a matter of corporate policy.

    So this reminds us that just because something is documented and available today does not prevent Microsoft from changing their mind at a later point and removing the documentation, failing to update it with new releases, or making it available only under a more restrictive license. Since Ecma owns the OOXML specification, as well as the future maintenance of it, any belief in the long-term openness of this format depends on your trust of Microsoft's future behavior in this area.


  2. Like any durable goods monopoly (and few things are as durable as software) Microsoft's largest competitor is their own install base. Microsoft has made many attempts at moving beyond the binary formats in the past, with Office 2000, Office XP and Office 2003. But in each case it failed. These were all false starts and abandoned attempts. So we should look for signs that OOXML is actually Microsoft's real direction and not another false start or dead end.

    My guess is that OOXML is merely a transitional format, much like Windows ME was in the OS space, a temporary hybrid used to ease the transition from 16-bit to the 32-bit platform that would eventually come (Windows 2000). Microsoft doesn't want to support all of the quirks of their legacy formats forever. That just leads to bloated, fragile code, more expensive development and support costs. They would rather have clean, structured markup, like ODF. But the question is, how do you get there? The answer is straightforward: First, eliminate the competition. Second, move users in small steps, promising the comfort of continuity and safety. Third, once you have eliminated competition and have the users on the OOXML format that no one but Microsoft fully understands, then you may have your will of them. For example, introduce a new format that drops support for legacy formats and force everyone to upgrade. They are pretty much doing this already on the Mac by dropping support for VBA in the next version of the Mac Office.

    Even a cursory look at OOXML shows that it was not designed for long term use, even by Microsoft. So the question I have is, what is the real format that they are going to?


  3. Microsoft, after pretty much ignoring document standards for over a decade, suddenly got religion in late 2005 and rushed whatever they had on hand into Ecma. Remember, just months earlier they had recommended the Office 2003 Reference Schemas to Massachusetts for official use. I'm certainly glad Massachusetts did not fall for that by putting their resources on another dead format in the Microsoft format graveyard. OOXML was not designed to be a standard. It is just a proprietary specification that Microsoft has dumped, at the last minute, into ISO's lap, in an attempt to translate their market domination into a standards imprimatur in order to further cement their market domination. It is a win-win situation for them. Either they have a effective monopoly in office applications and an ISO standard, or they have an effective monopoly in office applications. Nice situation for them either way. Reminds me a lot of Henry VIII and Clement VII. Henry set himself up to win regardless of what the Pope's response was.

Labels: ,

Monday, June 11, 2007

Hemidemisemiquavers

Some "short notes" to share with you:

From a GrokLaw news pick we hear that ZDNet's David Berlind recently interviewed Tim Berners-Lee in Boston, where Sir Tim received the Massachusetts Innovation and Technology Exchange's Lifetime Achievement Award. Watch the whole interview if you have 12 minutes, though I will transcribe one passage which highlights the importance of agreeing on a single open standard for a problem domain and fostering competition among the applications built upon that standard:

It was the standardization around HTML that allowed the web to take off. It was not only the fact that it is standard, but the fact that its open and the fact that it is royalty-free.

So what we saw on top of the web was a huge diversity and different business which are built on top of the web given that it is an open platform.

If HTML had not been free, if it had been proprietary technology, then there would have been the business of actually selling HTML and the competing JTML, LTML, MTML products. Because we would''t have had the open platform, we would have had competition for these various different browser platforms, but we wouldn't have had the web. We wouldn't have had everything growing on top of it.

So I think it very important that as we move on to new spaces ... we must keep the same openness we that had before. We must keep an open internet platform, keep the standards for the presentation languages common and royalty free. So that means, yes, we need standards, because the money, the excitement is not competing over the technology at that level. The excitement is in the businesses and the applications that you built on top of the web platform.

Well said. I tried to make a similar point, but with pictures, back in February.

I recently ordered some podcasting equipment. It should arrive tomorrow. I will be looking for people to interview soon. So hide while you can, don't answer the phone, and if it looks like I'm carrying a microphone, then run for the exit.

An interesting article in the American Surveyor, by Joel Leininger, on the importance of file format standards. Although it is a different application domain, the concerns are very similar (via OpenMalaysia).

Anyone know Romanian? Something gives me the impression that this guy from Microsoft Romania is not complementing me. I wonder what subtle hint gives me that impression...

The OOXML ballot marches on in national standards committees around the world. September 2nd is the deadline, though many committees have earlier deadlines for developing their recommendations. In the US the committee looking at OOXML is called INCITS V1, and we have until July 13th. V1 has had a few meetings so far and we're just starting to get into the technical comments. Since we have a consensus process, all it takes is a small minority of members to bring everything to a halt, which is pretty much what is happening. For example, we spent 2 1/2 hours today and discussed only two comments. So we risk having a perfunctory technical review of OOXML. When I compare this to the BSI's excellent work developing detailed comments on a publicly-readable wiki, I think we in the US should be ashamed at the stonewalling going on in V1.

I'll be hosting a V1 face-to-face meeting in a couple weeks in Washington, DC. Hopefully we'll make some more substantial progress there. If you really want to follow our work closely, you can read through our mailing list archives which Sun's Jon Bosak was kind enough to set up for us.

Although no formal call for public comments has gone out, we've received a number of unsolicited pro-OOXML letters which you can read here. As you can see, they are pretty much identical form letters, all ending with the artless phrase, "Furthermore, Open XML in no way contradicts any other international document standard." Remind anyone of the Manchurian Candidate's, "Raymond Shaw is the kindest, bravest, warmest, most wonderful human being I've ever known in my life"?

In any case, if you want to provide input into this process, feel free to send in your thoughts as well. Having read many of these letters myself, I'd offer the following advice:
  1. Don't send in a form letter. It hurts your cause more than helps it, since it makes it look like you couldn't get real support if you tried.
  2. Use your real name and email address and postal address, so we know you are a real person and not a robot.
  3. Be polite. Remember you are trying to persuade.
  4. Give a succinct, reasoned opinion. Keep it to a page if you can.
  5. Ask for a specific action. Don't expect the reader to draw a conclusion. Draw it yourself.
Of course, since V1 is developing the US position on OOXML, comments from US companies and citizens are especially welcome. Also, if you have specific technical comments about OOXML, you can submit them through me and, if I agree with your points, I will raise them directly with the committee. (I do this as a personal favor to you, my readers, not as an official INCITS V1 solicitation.) Assume the committee is already familiar with the GrokLaw items. But OOXML is a big standard, and there are certainly dark corners where I have not ventured. So if you've found something new, certainly let me know.

Canada continues to solicit comments on OOXML. And the UK is soliciting comments as well, through June 30th. Again, be succinct, and give your name and address. Otherwise you risk having a committee member reject your comment outright since it cannot be ascertained whether you are actually a resident of that country.

A blog I'd like to recommend to my readers is Lodahl's blog. Leif Lodahl has been giving some great coverage of ODF happenings in Denmark, including analysis of the parliamentary debate on the question of whether Denmark should have one or two standards. Also a good catch of Microsoft dancing all over the place, trying to avoid giving a straight answer on why Word does not provide integrated ODF capabilities. If you can spare 45 minutes this is a great clip to listen to.

Labels: , ,

Tuesday, June 05, 2007

Documents for the Long Term

We all will die. Institutions come and go. Empires and nations crumble. But what is written down may have transcendent longevity. Whether it is a personal letter from a departed friend, the minutia of administration or the recorded contemporary reports of great historical events, the durable written word has almost mythic status in our culture.

The permanence of the written word has fascinated mankind for millennia. The powerful knew the truth of this. To be sure that his deeds would outlive his contemporaries, the Emperor Augustus had his CV engraved in bronze in his "Res Gestae Divi Augusti" (Deeds accomplished of the Divine Augustus). The bronze did not survive, but the words have. Horace wrote in his Ode, "Exegi monumentum aere perennius" (I have erected a monument more lasting than brass). And his words have survived. Shakespeare in Sonnet #55 echoed this sentiment, "Not marble, nor the gilded monuments/ Of princes shall outlive this powerful rhyme". Shelly in his Ozymandias shows the irony of the surviving boastful inscription, "Look on my Works ye Mighty, and despair!" beside the "colossal wreck" of an ancient monument.

The saying is "ars longa, vita brevis" — art is long, but life is short. But this is not entirely accurate. The performing arts such as dance or music have a very sketchy and imperfect history until the rather recent invention of written notations. So dance before around 1450 is a matter of speculation. No doubt the ancient Bacchae accompanied their ecstatic revels with an equally furious dance. But we know none of it. Thucidydes has the Lacedamonians march into battle to the accompaniment of flutes. What martial notes they played we do not know. We can only speculate, with Thomas Browne, "What song the Syrens sang". Some like Benjamin Bagby may give a glimpse at earlier performance practice. And scholars like Milman Parry find echoes of ancient practices in traditional story telling. But we cannot know for certain.

The structural arts of architecture, city design, aqueducts, and monuments, engravings, these have all fared better over time. Even scattered texts from antiquity have survived. Text can have longevity, but not unassisted. Left to the ravages of water, fire, insects and fungi, papyrus, vellum and paper will only survive a few hundred years. For a text to survive longer, someone must copy it. So, the works of Cicero, these we have in rather good shape today, in part because Augustine of Hippo praised his works. (Then as now, getting a good review from a recognized figure is is the best marketing).

Which ancient texts were copied, and thus became part of the canon of western literature, was somewhat a matter of chance. Nine of the surviving plays of Euripides, existing in a single partial manuscript, are curiously in alphabetical order, but only containing plays beginning with the Greek letters eta through kappa, leading scholars to believe that this is merely volume 2 of a larger collection of plays that are lost. Euripides is believed to have written almost 100 plays. We have almost 20 of them today.

With digital documents, the issues are a little different. The transmission of digital data can be done without error. But digital media, the tapes, floppies and optical disks, these are susceptible to the ravages of time, light, heat, fungi and the gradual deterioration of the substrate. So, digital documents must be copied from one storage format to another every few years. And so modern digital data relies on the same haphazard selection mechanism as we see with ancient texts — survival depends on someone deciding that a document is worthy of copying and preserving.

That said, the survival of a document does not depend entirely on the whims of monks or archivists. There are certain engineering principles which are key to creating a document that lends itself to long term retention. Some of these are tasks for the individual authors:
  1. Keep a document intact. Better to preserve a document inclusive of annexes and appendicies.
  2. Separation of content, structure, layout and presentation
  3. Findability — a good title, a abstract, keywords and other metadata will help ensure that your document can be found and retrieved via current and future search technologies.
  4. Use of a fully-specified, open document format.

From another angle we can look at archiving from a systems view and follow a basic architectural principle. The key to durability, whether in documents, monuments, institutions, or whatever, all boils down to this: Do not depend on something less stable than yourself.

(I didn't invent that principle, but don't recall where I first heard it. Any idea who it was?)

If you depend on something less stable, which is to say more susceptible to change, than yourself, then when it changes, it forces you to change. Stability is when you change only when you want to change.

For example, a house is built on a foundation. A frame, plumbing and electrical, walls, wallpaper and furniture are layered on top. If replacing the wallpaper triggered a need for a new foundation, then we would say that the house was inherently unstable. But it is reasonable to expect that installing new plumbing will require opening a hole in a wall and later applying wallpaper. The expected rates of change of these various layers has lead to a method of construction that enforces this dependency chain. If for some reason we needed to make very frequent changes to the plumbing, then we would place them outside the interior walls, or behind removable wall panels for each access.

We carefully manage dependency chains when programming as well. For example, imagine a module A (a database client) that depends on a module B (a database server) where you believe that module B is less stable (has a greater rate of change) than A. This is a problem, since changes to B trigger changes to A. So we define a new interface layer C (maybe SQL) that is more stable than A or B. By having A depend on C rather than B directly, we transform the unstable dependency A->B, into the stable relationship (A,B)->C, where C is a standard.

This same principle applies to document formats as well. Never depend on something less stable than yourself. For the first few decades of document formats, the era of binary formats in the 1980's and early 1990's, we did this all wrong, as the following diagram shows:

In those days the file format stood atop a large set of dependencies and changes at all layers would lead to changes in the file formats. This created a very inflexible stack of dependencies, where changes in the less stable lower layers can trigger incompatible changes to the document format. When we see that an Excel file on the Mac has a different internal date format than an Excel file created on Windows, we're are seeing remnants of this kind of dependency chain.

Note also that these interfaces between the layers were not standards, but proprietary interfaces. For example, a Word 95 document might be seen as this:



The move to XML-based file formats changes this diagram but little. The format at the top is now XML but the dependency chains are the same. The relationship of the format to the technology stack has not changed:


If using a new document format requires you to buy a new application suite, update your hardware and buy a new operating system, then that should be a clear sign that something is wrong. "The tail wags the dog," as they say.

And note that a dependency is not the same as a layer. You can pretty things up all you want with the use of standards like XML, but still have adverse dependency chains. Taking a Microsoft Word binary format and translating it into XML, and putting it in a Technical Committee whose charter requires that it remain 100% compatible with Microsoft Word leaves you will a file format that depends on Microsoft Word, no matter now much XML Schema and Dublin Core you throw at it. The XML is just syntactic sugar. But the essence of the dependency chain remains: OOXML depends on Word and Windows, a single vendor's application stack. Instead of an application supporting a format, a format is supporting an application.

I should further note that a vendor, at great expense and effort, can forestall the bad effects of an unstable dependency chain, sometimes for many years. Instability, with effort, can be managed, as jugglers, unicyclists and stilt walkers remind us. Even though the Word binary format has many dependencies on the Windows platform, and on specific internals of Word and features and behaviors from earlier versions of Word, Microsoft has managed to preserve some level of compatibility with these older formats, even in current versions of Word. The support is far from perfect, and it certainly makes their file format and their applications more complicated and more expensive to work with. But that is the burden they face from bad engineering decisions back in the early 1990's. They and their customers live with that, and though they may not realize it, they all pay a price for it.

The alternate approach, the one that leads to better prospects for long term document access, is to have a stack, not of proprietary applications and interfaces, but of standards. ODF's long-term stability and readability comes from the fact that it is built upon, and depends upon other standards that are widely-used, widely-adopted and widely-deployed. ODF is designed so the format depends on things more stable than itself, with a solid foundation as seen here:


The suitability of a format for long term archiving depends as much on the formal structure of the technological dependencies as it does on specific details of the technologies involved. The greatest technologies in the world, if assembled in an unstable dependency arrangement, will lead to an unstable system. Look at the details, certainly, but also step back and look at the big picture. What technology changes can render your documents obsolete? And who controls those technologies? And what economic incentives do they have to trigger a cascade of changes every 5 years, to force upgrades? As consumers and procurers we all need to make a decision as to whether we want to ride on that roller-coaster again.

The question we face today is whether we want to carry forward the mistakes of the past and the extensive and expensive logic required to maintain this inherently unstable duct tape and bailing wire Office format, or whether we move forward to an engineered format that takes into account the best practices in XML design, reuses existing international standards, and is built upon a framework of dependencies that ensures that the format is not hostage to a chain of technologies that can be manipulated by a single vendor for their sole commercial advantage.

Labels: , ,

Thursday, May 31, 2007

The Legend of the Rat Farmer

The Tale


A long time ago in a land far away there once was a prosperous town called Hamelin. Everything was perfect in Hamelin until the year the rats came. The rats ate up the grain, bit the townsfolk in the toes and scared the young children. Something had to be done! So the Bürgermeister and the Council met together and decided to bring in an outside consultant, Pied Piper Enterprises, LLC. That did not go well. The rats were back the very next year.

So in the Spring the Bürgermeister again assembled the Council and they talked and talked and talked. Should they bring in another consultant? Should they abandon the town and move someplace else? They finally decided on a market-based approach to solving the problem. They would offer a reward, a bounty, to citizens who captured, killed and turned in rats. Turn every person in Hamelin into an exterminator. The signs soon went up all over town: "A Silver Thaler for every 10 Rats."

The Bürgermeister tracked the results on a big chart on the wall of his office and the numbers looked very good. Each day more and more rats were being caught and killed. The citizens were busy at work. The rats would soon all be gone.

But then one day the Bürgermeister went home, and in the doorway of his house was his wife and she was visibly disturbed, "You shall get nothing for dinner tonight! The rats have eaten all of the grain!"

"How can this be?" exclaimed the Bürgermeister. "The metrics show that we're eliminating a record number of rats every day. Come with, and I will show you the chart."

"Chart, schmart. I'll show you some metrics," said the Bürgermeister's wife, who then took him by the ear and lead him around the town center, and at each house they stopped and heard the same tale. The rats are still eating up the grain. They are still biting townsfolk in the toes. They are still scaring the young children.

Nothing at all had improved in the quality of life in Hamelin. All that had changed was that they now had a larger pile of dead rats, and a smaller pile of silver Thalers.

An inquest was held to account for the misuse of town funds. During this investigation it was found that a large percentage of the reward money had gone to one old man who lived by himself on the outskirts of town. The Bürgermeister and the Council went to visit the old man. "How did you manage to catch so many rats?" they asked, "You are old and slow".

"Simple," he said, "Let me show you". He lead them back around his house to a field where stood an old barn. As he opened the barn doors, he revealed to the astonished Council hundreds of small wooden cages, each one holding 10 large rats.

"I don't care for rats much myself", said the old man. "But since you wanted them so much, I thought I could help out a little. After all, I could use the money, and rats are so easy to breed".

"Bu...bu...bu...but we didn't want more rats," stammered the Bürgermeister. "We wanted fewer".

"Nonsense", said the old man. If you offer a reward for something, of course you want more of it, not less. This is just the free market in action."

The Commentary



We see here the results from failing to specify an appropriate metric. As is often the case, we tend to latch on to metrics that are easy to measure, such as counting dead rats, rather than harder to measure, but more appropriate metrics that truly indicate the achievement of our goals. For example, a reasonable metric might have been a "resident satisfaction index" based on a weekly survey of Hamelin's citizen's to see if their rat problems were decreasing. Or the Bürgermeister could have sent out a commission to count how many rats they find in the grain and tracking that number from week to week. The point is to have a metric that clearly and directly reflects the attainment of your goals.

So the lesson is that you should always watch out and ensure that the metrics being suggested truly reflect your ultimate concerns.

With that in mind, let's move forward to the present and what seems to me a similar confusion of metrics.

Jason Matusow, Microsoft's Director of Corporate Standards has written a new blog post, which concludes:

The fact of the matter is that translation between formats has always been the path to interop (for document formats), and now with XML-based formats that path is even more appropriate than ever through translation.

China wants to create its own standardized XML format...translation will enable interop. Google Docs has its own format....translation will enable interop. OpenOffice has ODF..translation will enable interop (to MS Office, to Google Docs, to IBM Workspace). Adobe PDF is its own format...translation will enable interop.


Jason seems to be suggesting that increasing the number of different formats and translators leads to an increase in interoperability. This is akin to saying that increasing the number of umbrellas improves the weather. It just doesn't work that way.

We need to step back and find the proper metric. If, for sake of argument, we define interoperability as the ability for different formats to work together, then obviously as we increase the number of formats and the number of translators then the sum total of interoperability (by that definition) in the world increases. In that case, let's make the old 1-2-3 format an ISO standard, the WordPerfect format an ISO standard, WordStar an ISO standard, XYWrite an ISO standard, Quattro Pro an ISO standard, Manuscript an ISO standard, Harvard Graphics an ISO standard, Freelance Graphics an ISO standard, etc. Just imagine how much interoperability we could have in the world if we simply could standardize more formats. Every application, could have its own standard format, or maybe two or three.

But you may smell a rat in the above argument. Interoperability of formats is not the appropriate metric. A simple look at the lack of OOXML support on the Microsoft's Mac Office shows that the introduction of OOXML has reduced interoperability, not increased it. Similarly, scientific journals like Science and Nature have already come out saying that they cannot accept the OOXML format. Translation among multiple formats only partially and imperfectly attempts to work around a break-down in interoperability caused by having multiple formats. It is a band-aid approach and does not address the core issue.

A more appropriate metric than counting piles of semi-functional translators is to look at things from the perspective of the user exchanging documents. The end user doesn't see or care about formats. They care about their documents and the people and processes that work with these documents. The question for them is: what is the cost to exchange their document with other users and business processes? In other words, what is the cost to interoperate? That is the metric that counts.

Several cost drivers come into play here:

  1. What are the choices and costs in application software necessary to author a document?
  2. What are the choices and costs in application software needed by the recipient of this document, in order for them to read it, or collaborate with me in editing this document?
  3. Will others see the document as I intended? Or will there be fidelity loss from conversions?
  4. Similarly, what are the performance, security, stability, legal and licensing implications of introducing any translation steps?
  5. How easy is it to program this document format? In other words, what is the cost of business process integration?

When looked at from this business perspective, we can get away from counting piles of dead rats and thus come to a quite different conclusion:

None of the cost-driver factors lead to reduced costs with multiple formats. They all have minimal costs when there is a a single format in use. So if the metric for interoperability is the "cost to interoperate", then interoperability (and choice as well) is maximized when a single application-neutral and platform-neutral document format is natively supported by multiple applications at a range of price/function points. Introducing even a single additional format into your business will escalate costs, degrade fidelity of document exchange, and reduce interoperability.

Labels: , ,

Tuesday, May 22, 2007

Interoperability by Design

We've all heard the interoperability hype. Let's see what is actually there.

First, we start by looking at the many ways in which documents are integrated into the Windows/Office platform. Any fluent user of this platform will use many of these capabilities on daily basis. These are basic features which have been around, in some cases, since Windows 3.0, maybe earlier.

Windows shell integration


  1. Double-click on a document on the Desktop or in a folder and it loads into the appropriate application. Double-click on a Word document and it loads in Word.
  2. Right-click in a folder and choose “New XXX” to create a new XXX document in the specified folder. So, "New...Microsoft Office Excel Worksheet" creates a new, blank Excel document.
  3. Right-click on a document, choose Properties and on the Summary tab you can view metadata for that document.
  4. Recently-edited documents appear in the “My Recent Documents” under the Start menu.
  5. Documents referred to in web pages, via URL links will render in an inline Office session in the browser.
  6. Documents are indexed by the Windows search engine.


Office integration


  1. Ability to File/Open, File/Save and File/New a document via the familiar menu options.
  2. Ability to set a file format as the default file format for the application.
  3. Ability to use the familiar keyboard shortcuts, Control-O and Control-S to open and save documents.
  4. Ability to forward a document to someone in an email and for them to be able to launch the a document by clicking on it when received via email.
  5. Ability to password protect a document.
  6. Ability to post a document to a web folder or to a SharePoint server
It must be noted that none of the above integration points are allowed by the ODF Add-in for Word, the much-touted translator for which Microsoft provides the, "Funding, Architectural & Technical Guidance and Project co-coordination".

Instead what we get is a new menu option added to the Word 2007 Office menu:




Note that this is parallel to, but not included in the Open menu where the formats that Word natively understands are accessed. Although the option presented here says, “Open ODF”, it should more properly be called “Import ODF”, for reasons which will be clear shortly.

After selecting an ODF document to open, the following progress bar is given while the conversion takes place:




This is followed by a warning dialog listing elements which may have been lost in conversion:




No option is given for disabling the above message from displaying. It should be noted that when converting from a legacy binary document to OOXML, Word gives a similar conversion warning dialog, but their version can be disabled by checking a "Do not ask me again" dialog.

Once loaded, the user will find that their document is no longer an ODF document. It has been automatically converted to a read-only OOXML DOCX file as the title bar reveals:




So any future operations the user performs on the document, such as mailing, saving, posting to a web server, etc., will be in OOXML format. The only way to get back to an ODF format file is to manually and explicitly go back to the Office menu, go to the ODF submenu and choose to save it to ODF format. At that point you will be presented a default name based on the DOCX temp file name, not the original name. In this case, it suggested “sampler_tmp1.odt”.

The “Save as ODF...” dialog will default to the directory last used to save a file, not necessarily the same as where your document was loaded from. So to save you must first navigate to your original document, select it and choose “yes” when warned about overwriting an existing document, and then the document is converted back into ODF format.

If you do further work on the document in Word, in that same session, and then want to save again, you must avoid the natural tendency to do a Control-S or to save the document when prompted when existing Word. These methods all will lead to a Save As dialog, suggesting an OOXML format, which will prompt you to rename the document since it is read-only. But it will not offer you the choice of saving to ODF format. The only way to ensure that you are saving to ODF format is to use the above steps, going back to the ODF menu, etc.

You cannot create a new ODF document from scratch in Word. If you try to create a new document and save it to ODF format, you will get an error message, telling you that you must first save the document. You must save the document before you can save it? Yes, you must first save it to a temp file in a natively-supported format like DOC before you can save it as ODF.

The difficulties are complicated when you have documents accessed by other means than the Word menus. Imagine that you receive an ODF document in an email which you want to edit and return to the sender. The following steps would be required:

  1. Manually detach and save your hard drive the ODF document from the email, since you will not be able to launch it directly into Word from your email client. Remember where you detached the document.
  2. Manually launch Word, since you will not to get Word to launch by clicking on the ODF document you just detached.
  3. From the ODF menu, choose to open the ODF document. Navigate to where you detached the emailed document and select it. Around 30 seconds later the document will be automatically converted to an read-only temporary OOXML document.
  4. Make your editing changes.
  5. Export the document back to ODF format using the ODF menu, either writing over the original file you extracted from the email, or to a new temporary file. Remember where you exported the ODF document to.
  6. Go back to your email application and attach the ODF document.

If this had been an OOXML document (or any other format that Microsoft really supports, like RTF) it would have been much simpler:

  1. Double click on the attachment in your email to automatically launch in Word
  2. Make your editing changes
  3. Use the Send/Email menu option in Word to send the email
As you can see the ODF support provided by the Add-in is very unfriendly.

Compare this to the OOXML support Microsoft added for older versions of Word via their Compatibility Pack. The OOXML support is tightly integrated with the UI, in a way users would find familiar and easy to use. But the ODF support is very shallowly integrated, amounting to little more than a menu item patched in.

One wonders if Microsoft's intent was really to annoy users? That would best explain the available evidence. It is simply not credible that anyone at Microsoft believes that they are listening to customers or providing interoperability with a feature that defies real-world use. What customers did they talk to that said that this Add-in was even remotely adequate?

Since Microsoft is the one providing the, "Funding, Architectural & Technical Guidance and Project co-coordination" one would think that they would contribute more in the area where they are uniquely qualified to assist, the full and native integration of the ODF support into Office.

Labels: ,

Friday, May 18, 2007

The Funnel and the Wedge

The idea was prompted by a comment a reader submitted to a recent post, where he talked about one of the challenges of trying to bridge two different formats (in this case ODF and OOXML) via translation:

Both formats must evolve their new versions simultaneously in locksteps...

This one is the killer. Trying to have two formats permanently synchronized this way is a maintenance nightmare, especially when we discuss standards with multiple implementations maintained by different organizations.

This is an important point, and bears some reflection. In my mind I have images of the funnel and the wedge, physical means of convergence and divergence. Similar forces are at play with standards.

We see the Funnel in the evolution of HTML. Although the standard has existed for over a decade, implementation support for HTML and related standards was uneven until quite recently. Interoperability was poor. From the start vendors added incompatible extensions while not implementing key features. Developers had to write extensive workarounds and alternative representations to work on all browsers. And when they did not that, their web sites might not work with all browsers. But with customer demand and prodding from groups like the Web Standards Group, the interoperable support of the HTML standard across implementations happened. There was convergence. What we have today, although not perfect, is clearly the result of a Funnel, concentrating industry effort around a single standard (or more like a family of standards).

This does not mean that vendors needed to sacrifice innovation, or deny their customers. It just meant that they accomplished their business objectives while also complying with standards. Along with adhering to various financial and securities regulations, labor law, health and safety, and other requirements, both internal and external, voluntary and mandatory, the browser vendors now complied with web standards. It is just another part of doing business.

Similar Funnels have occurred historically with network protocols, wireless telephony (in Europe at least), electrical grids, broadcast formats, etc. I've written elsewhere about what types of technologies tend to converge like this and why.

We see the Wedge when two standards compete in the same space and diverge into incompatible technologies. Microsoft is the master of the Wedge, with numerous examples over the years, usually proprietary, but more recently attempting to gain de jure recognition of them. But the mechanism is the same in either case: VML, JScript, MS Kerberos, J++, C++/CLI, XPS, and of course OOXML. Standardization just means that Microsoft has another tool for telling you that the Wedge is good for you. But it is a Wedge nevertheless.

The Wedge brings fragmentation, confusion and lack of interoperability, attacking the core reasons for having a standard in the first place. Once the primary value of an open standard is eliminated, we can all return to the security and comfort of our monopolist overlords. That is their main goal. Make no doubt about it, true interoperability and true choice are very scary propositions for Microsoft. It cuts at their very business model.

So consider the Funnel and the Wedge as applied to document formats. If we all use ODF today, is interoperability perfect? No. Do we know how to move forward to improve interoperability, and work together in multi-vendor consortia to perfect this. Yes, certainly. That is why and how such standards as TCP/IP, HTTP or HTML, work today. Interoperability came via the Funnel, a convergence of effort and attention leading to increased interoperability and the user and industry benefits that flow from that interoperability.

But from the Wedge, what can we expect? If Microsoft is successful, here's what I see, my dismal predictions:
  1. Within 30 days after OOXML is approved by ISO we see the demise of Microsoft's half-hearted attempt to create ODF Add-ins for Office. We'll never see a functional Add-in from them for Excel or PowerPoint, and the Word one will remain unacceptably slow.
  2. Microsoft will continue to evolve OOXML behind closed doors. 99% of the work will be based on product and decisions in conference rooms in Redmond, which will be later rubber stamped by Ecma and ISO.
  3. OOXML and ODF will continue to evolve and diverge, in incompatible ways.
  4. Seeing their success ramming through 6,000 page Fast Track submissions in ISO, Microsoft will follow up with similar fast track submissions for XPS, XAML, Silverlight, Windows Media Photo, whatever they have. Since they have taken the trouble to set up the machinery to dominate JTC1, they will continue to force feed them with additional material.
  5. Every jurisdiction where ODF is currently allowed and mandated will also allow or mandate the use of OOXML. This in practice will be turned around to mandate the use of Windows and Office.
  6. Finally, once all opposition is rendered harmless, they can shut down OpenOffice.org and KOffice by patent lawsuits, but keep Novell's version around in order to keep anti-trust regulators away. After all, 97% market share is not the same as 100%.
Maybe I'm a bit pessimistic, but I see little reason for optimism.

So do we have an alternative to the Wedge? What would encourage the Funnel? The following would need to happen:

  1. ISO must reject OOXML.
  2. Customers, from private and public sectors, must make their voices heard, that they want true interoperability and choice and that this means a single document format.
  3. Microsoft must support the existing ODF completely and fully in Office. It won't happen overnight. But it won't happen at all unless they start.
  4. OASIS must work with Microsoft (and Microsoft with OASIS, of course) so that that it is clearly explained how MS Office can fully represent their documents in ODF. This need not be a monolithic monster like OOXML, but should be a layered standard, with a basic core feature set and defined extensions and profiles that encompass wider and wider ranges of functionality. If Microsoft absolutely needs the "heebieJeebies" Art Border in Word in order to maintain 100% fidelity with legacy documents, then the ODF TC can show Microsoft how to encode this in ODF. The Funnel starts when Microsoft abandons their divergent effort in Ecma and joins the common effort around ODF, a single document format for personal productivity applications.
  5. The application vendors, Microsoft included, must work together on defining the organizational, standards and technical means necessary to measure, test and certify ODF compliance, so customers and procurement agencies are able to have assurances that they are getting the level of interoperability that they desire.

I think this is a natural progression. Accomplishing the first step stops the Wedge from progressing further, halting but not reversing the divergence. The other steps reverse the damage and turn us down a path of true interoperability, leading to true choice and innovation.

Finally note that the Wedge is typically driven by a single company. It is not a pull by public demand or from customers, though it may wear many disguises. It is a deliberate attempt by one party to cause division and divergence. But a Funnel, this won't happen at all unless there is strong demand, from customers, from government agencies, from national standards committees, etc. If this is to happen, your voice must be heard. All of us must work to bring all of us together in this effort. But it takes just one company, with a sufficiently large Wedge to pull us apart.

Labels: ,

Thursday, May 10, 2007

So where are all the OOXML documents?

Google has a nice feature that allows you to search for documents that match a given file type. This is done by adding "filetype:NNN" to your query, where NNN corresponds to the file type. This feature has supported the ODF and OOXML document formats for at least two months, when I first noticed it. I've been tracking some numbers since then and now have enough data to make some observations.

At last count the totals were:

FormatCount
ODT85,200
ODS20,700
ODP43,400
Total ODF149,300
DOCX471
XLSX63
PPTX69
Total OOXML603


As you can see, there is some round-off happening on the upper range. Perhaps at the high-end counts are estimates based on sampling?

In any case, I am rather surprised by the low counts given for OOXML documents, especially considering that this format has been supported since the Office 2007 beta last summer. According to Brian Jones, there have been over 4 million downloads of the OOXML Compatibility Pack for older versions of Office, and that there is a new community of, "over 300 other companies and partners who care deeply about OpenXML". We're also told that Office 2007 sales are above expectations, "two times greater than the purchases of Office 2003" according to one research firm. Recently announced third-Quarter results for Microsoft showed "better than expected" results for Office 2007 sales, $200 million better, according to Microsoft CFO Chris Liddell.

So with all this evident love for Microsoft Office 2007, why is it that 6-months later there are only 63 OOXML spreadsheet documents on the web, something like 0.3% of the number of ODF spreadsheet documents? How can there be 300 companies supporting OOXML and only have 69 OOXML presentations on the web? (This is starting to sound like when I say I support 30 minutes of aerobic exercise a day. I don't do it, but I sure support it!)

OK, I know the argument about "dark matter", that Google indexes only the tip of the iceberg, that there is a lot of data squirreled away on PC hard-drives, behind corporate fire walls, etc., stuff that Google will never see. But the same is equally true for ODF documents, right? I have tons of ODF documents on my laptop, but none of them are indexed by Google.

Of course ODF has been around for a year longer than OOXML. That's an important fact to acknowledge. We can put that in perspective by plotting the graph of ODF and OOXML document counts against the number of days since adoption of these two standards. So ODF counts are based on a start of 1 May 2005 and OOXML starting in 7 December 2006, when OASIS and Ecma respectively approved them. You get this:



As you can see, ODF has a nice upward trend. OOXML is also trending upwards, though it is somewhat lost at this scale. If you do the analysis it comes out to around 300 new ODF documents per day versus 6 for OOXML. So, two years later, ODF adoption, in terms of documents per day, is 50-times greater than OOXML is, at a time which should be OOXML's high-growth period, considering all the great news that is coming out of Redmond.

So I'm a somewhat at a loss to appreciate the significance of Novell or Corel adding OOXML support to their editors. With only 63 OOXML spreadsheets out there, wouldn't it be cheaper just to hire someone to retype the documents in the destination application? The average user is more likely to find a Buffalo Nickel in their lunch change than to find an OOXML document outside of captivity.

Labels: ,

Wednesday, April 25, 2007

Math markup marked down

Sun's Erwin Tenhumberg fights some FUD about ODF and in passing provides a link that is worth a few more words. It appears that Science, the journal of the America Association for the Advancement of Science (AAAS), itself the largest scientific society in the world, has updated its authoring guidelines to include advice for Office 2007 users. The news is not good.

Because of changes Microsoft has made in its recent Word release that are incompatible with our internal workflow, which was built around previous versions of the software, Science cannot at present accept any files in the new .docx format produced through Microsoft Word 2007, either for initial submission or for revision. Users of this release of Word should convert these files to a format compatible with Word 2003 or Word for Macintosh 2004 (or, for initial submission, to a PDF file) before submitting to Science.

Well, so much for 100% compatibility, eh? That is what I've been talking about. Whether you move to OOXML or ODF you will be making a change that will break compatibility with your past document processing systems. You will need to change over the next couple of years and you will need to examine your choices carefully. But don't get suckered into thinking that the choice of OOXML is magically painless. The 100% compatibility claims don't hold water.

More bad news:

Users of Word 2007 should also be aware that equations created with the default equation editor included in Microsoft Word 2007 will be unacceptable in revision, even if the file is converted to a format compatible with earlier versions of Word; this is because conversion will render equations as graphics and prevent electronic printing of equations, and because the default equation editor packaged with Word 2007 -- for reasons that, quite frankly, utterly baffle us -- was not designed to be compatible with MathML. Regrettably, we will be forced to return any revised manuscript created with the Word 2007 default equation editor to authors for re-editing. To get around this, please use the MathType equation editor or the equation editor included in previous versions of Microsoft Word.

Uh oh. Not only cannot you not submit files in OOXML format, but you can't even use Office 2007 and save in the old binary formats. Down conversion or using the Compatibility Pack won't help. Microsoft's decision to push a new "Open Math Markup Language" rather then use the well-established MathML standard appears to be a serious flaw.

Nature appears to have the same problem:

We currently cannot accept files saved in Microsoft Office 2007 formats. Equations and special characters (for example, Greek letters) cannot be edited and are incompatible with Nature's own editing and typesetting programs.
Of course, when targeting final publication of a paper, a PDF file is fine. But when engaging in collaboration with another researcher, or an editor, you need to agree of a standard format in which you both can work.

Reuse of existing standards is important. When you reuse a standard, you are reusing more than a piece of paper. You are reusing the experience and effort that went into creating and reviewing that standard. You are reusing the experience gathered by those who have already implemented the standard. You are reusing the books and training materials already written for that standard. You are reusing the interfaces for other technologies that have already integrated with that standard or can produce or consume output that conforms to that standard.

Isaac Newton wrote, "If I have seen further it is by standing on the shoulders of giants". When you reuse standards you reuse the accumulated wisdom of an industry and assume the vision and powers of giants. But when you ignore all precedents and go forth on our own, well, let's just say the outcome is more variable in that case. You may be the next Einstein, or you may be the next fool.

If Science and Nature need to update their templates, then I'd suggest they take a look at ODF. Not only does it use MathML for equations, but it is an open standard, an ISO standard, a platform and application-neutral standard that has many implementation, including several good open source ones. If they need to update their processing, then they might want to make the smart choice now, the choice that increases their choices and flexibility going forward.


18 June 2007 Update

A response from Nature and one of their vendors, explaining the complexity of migrating their publishing ecosystem to a new file format. Quoting a letter to Microsoft from Bruce Rosenblum of Inera:

Had the conversion from DOCX to DOC provided a conversion from OMML to Equation Editor format, it would have provided the necessary backwards compatibility for publishers to upgrade one system at a time. But because this compatibility is not available, it's created the need for a "big bang" upgrade, or a delay until the ecosystem of inter-dependent systems is deliberately updated over time. In the environment of scholarly publishing, such substantive upgrades often take years, not months.

Labels: , , ,

Monday, April 23, 2007

Sometimes I need to remind myself

Tim Anderson has an interesting article up on his ITWriting blog, “Microsoft’s Jean Paoli on the XML document debate”. Of course, I treat anything Jean Paoli says on XML with such attention as I usually reserve for listening to the isorhythmic motets of Philippe de Vitry. Like de Vitry, Paoli can be understood on several different levels: What is he saying? And what is he really saying. As a student of Empson's “Seven Types of Ambiguity”, I hope that I am up to the task.

There is, of course, the familiar canard, that IBM is the source of all of their problems:
It is clear though that Paoli is upset by what he sees as an international campaign against OOXML orchestrated by IBM, the sole naysayer in the ECMA voting. “There are IBM employees going to ISO, and saying a lot of technically incorrect things. When ODF went to ISO Microsoft did not interfere. IBM is betting on ODF, to have governments preferentially buying IBM software. It is OK to compete, but using this kind of argument around is it an open format or not … it’s widely known now, Office Open XML is an open format, even the EU says it is.”

A Google search on the words ecma ibm sole vote returns an embarrassingly large number of hits. Microsoft has certainly been having fun with this line. Let's take a little look at this question and see if we can better define this conspiracy that Paoli is alluding to.

I'm now going to rant a little. You may want to stand back.

Yes, IBM was the only voting member in Ecma who cast a voted against OOXML. But guess what, we're probably the only company who actually had someone perform the due diligence of reading the specification. The others voted on OOXML without reading the spec. So please give their “Yes” votes all the weight they deserve, but not more.

It seems to me that Ecma has become a standards factory, a place where you go for clean, efficient, no-guilt, fast-track service. Don't want to publish your public comments? Fuggetaboutit. Don't want to publish your meeting minutes? Fuggetaboutit. Worried about rushing through a 6,000 page specification in less than a year, with 20x less scrutiny than average? Fuggetaboutit. Want to have a unanimous vote, along with with a souvenir photograph of your face when the vote occurs? Yes sir, we guarantee it.

However, for the privilege of this elite service, you must cough up the dough. You will not find Ecma's rate card on their website, but I'm told that voting membership will set you back $57,000. This is not exactly the club to join if you are a small (or medium) business, non-profit, public sector agency, or anything but one of the big boys. A list of the privileged twenty voting members of Ecma can be found here.

As you can imagine, one does not become a voting member of Ecma without a good reason. This is a business expense, not a charitable contribution. For $57K, one expects $57K of service. To justify that membership fee, you expect your technology to be blessed with an Ecma standards imprimatur without hassles. So the “unwritten rule” is that everyone votes in favor of everyone else's proposal. It is considered rude to vote against something that another elite member has paid so much for. So, IBM gets get a lot of grief for casting a single "No" vote at a single Ecma General Assembly. We broke the club rules. I'm proud to work for such a company.

My question is this: How many “No” votes have been cast in Ecma in the past 5 years? When before did another Ecma member ever vote “No” on a standard? If no one can remember even a single previous “No” vote, or (sacre bleu!) a defeated standard, then that speaks volumes. In a healthy standards body, a single “No” vote should not be a newsworthy event, and should certainly not be something that Microsoft is still complaining about 6 months later.

To put this in perspective, the base category of OASIS voting memberships (Contributor) starts at $1,100. OASIS has something like 330 organizational members eligible to vote, including all categories of companies, government agencies, non-profits, etc.

I should also note, just coming from the annual OASIS Symposium held last week, that the OASIS Board of Directors is looking at changing the OASIS voting rules to make it more difficult for OASIS standards to be approved. Yup, we're raising the bar.

When I see this I need to try extra hard to remind myself that IBM is just interfering with Microsoft's good-faith attempt to humbly submit for our consideration their well-written, detailed, high-quality, interoperable open standard.

ISO/IEC JTC1/SC34 recently had its annual plenary. This is the same group of ISO National Body (NB) members who voted in favor of ODF last year, and over the next few months many of them will be recommending positions on Microsoft's OOXML to their national standards bodies. I was on the delegates list for attending this meeting, as a representative of the US NB, but had to cancel at the last minute because of a family emergency. When I saw the attendance list, I was surprised to see that Microsoft had sent five people, this to a meeting of only 37 people. They practically darkened the skies with their employees. And what about the conspiratorial army that is hounding them at every corner? Zero people from IBM. Zero as well for Google, Sun, RedHat, Adobe, Oracle and Novell.

When I read this I need to remind myself that I'm part of a vast global conspiracy to deny Microsoft a fair hearing within ISO. The fact that no one in this vast global conspiracy managed to show up at the meeting was simply a ploy to make Microsoft feel overconfident.

In the US NB, we have a committee called INCITS V1. It is the mirror committee to JTC1/SC34. I serve on it, the only member from IBM. Imagine my surprise, when at our last call, Microsoft shows up with 3 employees and a business partner as new members. Four people against little ol' me? Come on guys, that is just sad.

At times like this I need to remind myself that Microsoft is the underdog and IBM and its allies are ganging up them. But our guys are invisible at meetings and although they cannot vote, they do have ninja powers and, in matters of external affairs, the delegated plenipotentiary prerogatives of Klingon Ambassadors. “choSuvchugh 'oy'lIj Daghur neH”.

Microsoft bloggers, fed and spreading like mushrooms, recently popped up and simultaneously announced a new pro-OOXML petition, self-published, self-hosted and self-reported by Microsoft. You couldn't find anyone to even pretend to support you? You had to host your own petition? This is like throwing a birthday party and having only your mother show up. Very sad. Where are your friends, Microsoft? How come we hear no one else speaking approvingly about OOXML? Where are the other companies lining up? Where are the endorsements? The testimonials? All we hear is that Microsoft thinks OOXML is great. But that is just Mom cheering on your performance. Don't you have any real support?

Btw, this is what a real petition looks like. It is hosted by a reputable party (the Prime Minister) and gives a open, public listing and tally of those who signed the petition.

At times like this I need to remind myself that the ODF supports are the outsiders in this debate, using unconventional and covert tactics to fight a well-respected and well-loved mainstream technology generously provided by Microsoft.

I see that Microsoft likes to throw around names like the British Library and Library of Congress, as if the mere mention of their holy names brings sacramental blessings. But please show me a public statement where either of these bodies has endorsed, adopted, recommended adoption or recommended approval of OOXML. The mere mention in passing of well-known and popular institutions lends no credibility to your argument, and credible arguments are important, as is well-known to anyone familiar with Walt Disney World, the Louvre, NASA , the Boston Red Sox, or the Department of Really Important Stuff .

A Malaysian standards committee was moving forward to approve ODF as a national standard in Malaysia. This is called “transposing” an International Standard, and is commonly done when a relevant International Standard is approved. Microsoft has made every attempt possible to prevent this committee from making progress with their review of ODF, for almost a year now. This progress recently came to a halt, the committee's decisions nullified and the committee suspended.

When standards committees are disbanded when they get too close to approving ODF, then I must pinch myself and remind myself once again that IBM is the one orchestrating international campaigns against Microsoft, and not the other way around.

I've heard similar complaints from other NB's. Why bother reviewing OOXML? Why waste the effort reading it and suggesting improvements? Microsoft has ignored every suggestion given it so far by NB's. And if you vote no, Microsoft will just escalate and try to get some mid-level government bureaucrat to set aside the recommendation of your country's technical experts. What waste the next 4 months reviewing a 6,000 page specification? It happened in Malaysia. It happened in the US. The INCITS Executive Board was about to send a contradiction submission against OOXML, saying that it possibly contradicted ODF. But before the committee could reconvene the next morning, enough members had received urgent phone calls to cause them to change their vote and abstain. We saw this in the Netherlands as well, where it was even reported in the papers that they would vote against OOXML. But that vote was changed at the last minute with the cryptic message to the JTC1 Secretariat: “The Netherlands Standardization institute votes ‘abstain’. Please change our vote accordingly and please confirm receipt of this vote to me...” What happened there is still unclear. In India it was even worse, when the committee that was supposed to get the ballot did not receive it. Evidently it was misplaced. The intervention of the leader of a major national political party was required to straighten it out. I also received a note saying that the committee was being told that the deadline for responding to the ballot was two weeks later than it really was, a delay that would have invalidated their vote if they had fallen into that trap.

When I see stuff like this happening, I need to remind myself, really, really hard, that IBM is the bad guy in this debate and that we're the one interfering with an orderly ISO process.

When an amendment to a Florida State Senate bill was offered that called for a “business case analysis” for the use of open standard document formats (no particular format was called out) Microsoft's lobbyists, the three Men in Black, Will McKinley of Dutko Poole McKinley, Jim Daughton, Jr. and Geoffrey Becker both of Metz, Hauser, Husband & Daughton, swarmed down and zapped it. As one legislative aide put it, “By the time those lobbyists were done talking, it sounded like ODF (Open Document Format, the free and open format used by OpenOffice.org and other free software) was proprietary and the Microsoft format was the open and free one”. Perhaps a document, left by the lobbyists, filled with lies about ODF, had something to do with it? We should be fortunate that Microsoft sent only three lobbyists to handle this, rather than all nine lobbyists who are registered in Florida alone to support Microsoft's legislative activities.

When expressing our technical opinion defines interference, and the outrages that Microsoft is getting away with become the norms of behavior, then we're all doomed to a future of technical subservience. We all need to remind ourselves of that.

Microsoft likes to complain, and they are evidently becoming quite adept at it. If decibels and dollars could win arguments then they would surely be the winners. But I think their protestations are mis-directed. Microsoft is like an out-of-condition middle-aged man (somewhat like myself) out for a rare jog. They can curse to the high heavens the pain they feel, but don't blame it on others. It is called competition. Deal with it. If it hurts so much it is because you are so out of practice. You should try having competition more often. It is good for you.

Labels: , ,

Wednesday, March 28, 2007

The ODF Validation Service

No, this has nothing to do with getting discounted parking if you use ODF, though that is an intriguing idea...

Daniel Carrera (OpenDocument Fellowship and the OASIS ODF TC) has a new blog and with it comes news of a new ODF tool, an ODF Validator Service, written as part of the Fellowship's ODF Tools project by Alex Hudson.

It is in the spirit of the W3C's Markup Validation Service: upload a document and get an instant report of whether or not it is valid ODF, and if not, what problems were found. I tried a few documents and it seems to work well.

It would be interesting to see if something like this could be made into a flexible framework for scanning ODF documents, at various levels. Think of a SAX-like call-back parser but at multiple levels of detail. So the framework knows how to fully parse an ODF document and identify features at the Zip and XML level. Plugins to the framework can subscribe to various parse events. So, maybe a ZipListener interface that simply has methods onFile() and onDirectory(). Then a ManifestListener interface that allows you to subscribe to notifications of the data in the manifest. Then within a document, like a spreadsheet, you could have listeners at the structural and content level, so onWorksheet(), onCell(), or in a Wordprocessor document, onTable(), onImage(), etc.

A framework like this could allow you to make a range of applications that need to scan an ODF document and take some action on it.


The benefit of the framework is the reduction in code required to get directly to the info in the ODF document you want, without having to master the ODF specification or writing a lot of parsing code. Think of it as a framework for easy multi-level information extraction from ODF documents.

Change Log

4/11/2007 — Removed parenthetical comment about the need for a privacy policy, since one has now been added to the Validator page.

Labels:

Tuesday, March 20, 2007

Cannibalism

A interesting post by Bob Sutor. What is OOXML's real competition, and how does that help ODF? The dynamics get interesting when you are hindered by your own install base. The main selling point of OOXML is its claimed 100% compatibility with the legacy binary formats. But if you are using Office 2000, and happy with it, what is the reason to move to OOXML? Why not remain using the binary formats? What justifies the migration?

The downside is clear. The minute you move to OOXML you have less choice with whom you can successfully exchange documents with. Office for the Mac, Windows Mobile, WordPerfect Office, Google Docs and Spreadsheets, SmartSuite, ThinkFree Office, users of these products, and the numerous 3rd party applications that can read and write the binary formats, these are now outside of the universe of people and applications that you can exchange documents with. Despite some early attempts from Sun and Novell, Linux users are left out as well.

So why move to OOXML? From the CTO's perspective, if your greatest concern is legacy compatibility, what is the ROI argument for changing file formats? Wouldn't the tendency be to remain where you are?

So the breakdown may happen like this:

I think that B & Z may be the dominating factors. N is large now because it includes the inertial effects of Microsoft's market dominance. Even companies that don't make an explicit choice will end up with that path by default. But even the most passive company will not fall into choice A without some thought.

It is interesting to speculate on the initial percentages. But note that this is a network effect game, so the percentages will vary over time based on expectations.

Labels: ,

Monday, March 19, 2007

ODF Freely Available

Another step forward for ODF. After gaining ISO approval in May, and Publication status in December, ISO/IEC 26300 is now counted among ISO's "Freely Available Standards". What is the significance of this? The text is identical to what it was in May, but you no longer need to pay 342 Swiss Francs to ISO to download an official copy. It is now free. Enjoy!

Labels: