Rob

Asking the right questions about Office 2010’s OOXML support

2009/11/17 By Rob 19 Comments

There is more OOXML controversy in the news, this time in Denmark. I don’t claim to understand all the nuances of the accusations, since I don’t read Danish, and Google Translates makes it sound at times like a discussion about loaves of rye bread or something, but the gist of it, as I can surmise from this account, is whether Office 2010 will “support the complete ISO-approved version of OOXML”. Microsoft’s spokesperson says it will. Mogens Kühn Pedersen, chair of the Danish Standards Committee, says it will not.

This is the kind of dispute where you can go around in circles with for days and not reach agreement. The problem is they are arguing over words, not facts, and they do not agree perfectly on the meaning of the words. Words like “support” and “complete” and “conform” are used in different ways, with different meanings and intents.

Let’s try to escape the equivocation and instead try to establish the underling facts. I can’t promise that this will clarify the situation any. In fact I suspect we’ll end up even more confused about what exactly Office 2010 actually supports. But replacing a false certainty with an honest uncertainty is progress of a kind. It gives us something we can build on.

First, we need to acknowledge that OOXML entered ISO as one standard, and was transformed, via the BRM and ISO ballot, formally into 4 standards, ISO/IEC 29500 Parts 1, 2, 3 and 4. Within these parts are are several different conformance targets and conformance classes. In particular, these 4 standards encompass two different and incompatible schemas for many of its features: “Strict” and “Transitional”. What Microsoft submitted in the Fast Track is essentially the “Transitional” schema. What was created by the BRM was the “Strict” schema. This is where Microsoft made most of its “concessions” in order to turn “No” votes into “Yes” votes. So things like support for spreadsheet dates before the year 1900, the elimination of VML graphics, etc., these are all in the “Strict” schema. All the legacy “DoItLikeWord95” garbage was in “Transitional” only. Several NBs voted to approve OOXML because the assertion that “Transitional” would not be written in documents produced by future versions of MS Office. The promise was that it was…well…transitional, for moving legacy binary documents into XML. Few people want to support two different document standards (both ODF and OOXML) in the first place. But to require support for two different and incompatible versions of OOXML — that is simply intolerable.

In any case, because of these two conformance classes, anyone who claims that their product supports “OOXML” in an unqualified sense, without stating which conformance target or conformance classes they are supporting, is not stating anything of substance. It is like trying to buy an electrical plug adapter by just saying “I need electricity”. Merely saying “conformance to OOXML” means nothing. You need to state the conformance targets and classes that you support. Remember, the conformance language of OOXML is so loose that even a shell statement of “cat foo.docx > /dev/null” would qualify as a conformant application. I assume that Office 2010 supports at least that.

Of course, the alleged assertion that Office 2010 supports OOXML “completely” is a bit more problematic. What exactly does this mean? Does this mean that Office 2010 supports all conformance classes and targets of all four parts of OOXML? Including being a Strict consumer? A Strict producer? That would be a good thing, IMHO, if it were true. But that is not what ISO/IEC JTC1 SC34/WG4 was recently told in Seattle, where they were told that Office would not write out Strict documents until Office 16. That would put it out to the middle of the next decade, assuming the typical 3-year Office release schedule.

So I’ll lay out my assertions (with the caveat that Office 2010 is not complete and shipped yet) as:

Office 2010 will conform to the Transitional consumer and producer classes defined in the OOXML standards. Any bugs that are found in the shipped version of Office 2010 will be “fixed” by retroactively changing the standards to match what Office actually does, as is currently being done by Microsoft-packed SC34/WG4 committee with similar bugs found in Office 2007’s OOXML support.
Office 2010 will not have conforming support for OOXML Strict producer or consumer classes.
Office 2010 will write dozens of non-interoperable, proprietary extensions into their OOXML documents, extensions which are not defined by the OOXML standards and which have not been reviewed or standardized by any standards committee and which will not be fully interoperable with other OOXML editors, or even with previous versions of MS Office.

So instead of arguing over the meaning of “support” and “complete” I suggest some alternate questions for Microsoft, to give them the opportunity to clarify exactly what kind of support for OOXML will be coming in Office 2010:

Exactly what ISO/IEC 29500:2008 conformance classes and targets will Office 2010 conform to?
Is this contingent on first changing the conformance requirements of the published ISO/IEC 29500:2008 standards to match what Office 2010 actually supports? Or is there a commitment to support the published standards as they was approved by JTC1 national bodies? In other words, is Microsoft committed to conform to the standards, or are we back to changing the standards to “conform” to Microsoft?
Will Microsoft Office 2010 write out only markup that is fully described in the OOXML standards? Or will it write out proprietary markup extensions that are not fully defined in the standards? In other words, will Office 2010 be “strictly conformant” with the ISO/IEC 29500:2008 standards?

The problem you run into here is that there are really two different OOXML standards: the new and improved OOXML Strict conformance class, the one that was “sold” to ISO NBs, the one that garnered the approval votes, and then the old ugly one, the “haunted” specification, the Transitional conformance class, supported only by Microsoft Office. Anyone considering adopting OOXML should have perfect clarity as to which one they are adopting, especially since these are two very different standards, both formally and logically. Just as it is problematic to speak about OOXML support in a product without stating which conformance classes and targets are supported, it is equally a defect of any adoption policy to be loose in what version of OOXML is being proposed for adoption.

IMHO, if you must state a requirement for OOXML (along with ODF), at least specify it clearly, and state a requirement for “strict conformance” (meaning no extensions) of the Strict conformance classes of ISO/IEC 29500:2008. To do otherwise is to essentially specify a requirement for the use of Microsoft Office and Microsoft Office alone.

ODF 1.2, Part 3 goes out for Public Review

2009/11/16 By Rob 4 Comments

A major milestone for ODF 1.2 was reached on Friday. Part 3 of ODF 1.2, which specifies document packaging (how a document’s XML, images and metadata are combined into a single file and are optionally encrypted or signed), went out for a 60-day public review period. This public review period will run through January 12th, 2010. A public review is a necessary OASIS procedure before a Committee Draft can be approved as a Committee Specification and then as an OASIS Standard.

The official announcement of the review has more information, including links to download the public review draft and information on how to submit comments on the draft.

Compared to the packaging specification used in ODF 1.0 and ODF 1.1, the main differences are:

We’ve split this material into its own specification, since these packaging conventions are more widely applicable, and in fact have been more widely used than just in ODF. For example, the International Digital Publishing Forum (IDPF), who standardize the increasingly important ePub digital book format, use ODF’s packaging as the base of their Open eBook Publication Structure Container Format (OCF) 1.0 specification.
We’ve added digital signature support (chapter 4) based on the W3C’s XML Digital Signature Core, including the ability to use standardized extensions such as XAdES.
We now have an RDF-based metadata framework with OWL ontology for the manifest file (chapter 5).
We include a more detailed conformance definition has been added, including conformance targets for packages, producers and consumers, as well as a separate conformance class for extended packages.
Generally, a redraft of the specification to ISO style guidelines.

This specification is only 34 pages long, so if you’re at all interested please give it a look between now and January 12th, and send along any comments via the office-comment list. Anything that improves the specification is welcome, from reports of typographical errors, to technical omissions or errors, to suggestions for future features. It is all good.

And if you want to follow along, you can track the incoming comments in several ways:

Subscribe to the office-comment list mentioned above.
View the archives of the off-comment list.
View the public review comments we’re tracking in JIRA. I have a python script that scrapes the office-comment list and enters them into JIRA. This will be more complete than the office-comment list because it will includes additional comments from ODF TC.
I have another python script that takes each newly entered issue from JIRA and sends it out via Twitter. So you can follow all new ODF issues by subscribing to @ODFJIRA. Depending on your Twitter reader, you might be able to mark some issues as “favorites” and return to them later to see how they have been resolved. (While you’re at it, you might also follow me, @RCWeir)

Also, keep your eye open for the announcement of a public review for ODF 1.2, Part 1 (ODF Schema) and Part 2 (OpenFormula), which will be ready for review soon.

The Final OOXML Update: Part III

2009/10/27 By Rob 9 Comments

This is Part III of an 5-part series on the state of OOXML today. Previous to starting this series, I had not posted about OOXML in over a year. Part I showed how Microsoft, despite their promises that control of OOXML would be handed over to an independent, international committee, have instead stuffed the committee that maintains OOXML (JTC1/SC34/WG4) with Microsoft employees. And in Part II I looked at how the final published text of OOXML failed to account for all BRM decisions, and described the steps that ISO was taking to remedy this obvious procedural flaw.

In this Part I’ll look at how Microsoft is using their dominance in SC34 to push through hundreds of changes and additions to OOXML, in a misuse of a procedure intended for correcting drafting errors, to make OOXML “conform” to Microsoft’s monopoly product.

Let’s start by taking a look at the OOXML defect log [PDF] that SC34/WG4 uses to track their large list of errors and omissions discovered in the published standard. This defect report currently amounts to over 800 pages, longer than the entire ODF 1.0 standard. But it is well worth downloading and browsing through.

Some of these changes will be made in Technical Corrigenda while others are proposed for Amendments. What is the difference? SC34/WG4 itself made the distinction clear, in a presentation (N 1187 for those with access) it made to the SC34 Plenary in Prague, where it outlined its practice for deciding which changes would be made in corrigenda versus amendments:

All of the following criteria should be met for the defect to be resolved by Corrigendum:

1) WG 4 agrees that the defect is an unintentional drafting error.
2) WG 4 agrees that the defect can be resolved without the theoretical possibility of breaking existing conformant implementations of the standard.
3) WG 4 agrees that the defect can be resolved without introducing any significant new feature.

Unless all the above criteria are met, the defect should be resolved by Amendment.

These are reasonable criteria and no objections were made when these guidelines were presented to SC34.

A key procedural point is that in ISO/IEC it is the JTC1 NBs who are the consensus body that has the authority to create international standards. All ballots which create or substantial modify standards must be approved by JTC1. This includes DIS ballots, FDIS ballots, FDAM ballots and DTR ballots. So standards, technical reports and amendments are ultimately approved or disapproved by JTC1 NBs. Although subcommittees in JTC1, such as SC34, provide the technical expertise and author and review work, they are not the standardizing authority. The exception that proves the rule is with corrigenda, which are authored and approved entirely at the SC level. However, this small area of autonomy in defect correction comes with carefully delineated bounds. A SC can author, approve and publish corrigenda by itself, but only to make corrections.

So if we look at JTC1 Directives 15.4.2.2, we read (with my emphasis) “A technical corrigendum is issued to correct a technical defect…. Technical corrigenda are not issued for technical additions which shall follow the amendment procedure…”. And in 15.4.1 “technical addition” is defined as: “Alteration or addition to previously agreed technical provisions in an existing IS.”

So amendments, which require approval by JTC1, are used for altering or extending the provisions of a standard, while corrigenda are used to correct errors introduced in drafting or publication. This dichotomy is common in other standards organizations. For example, in OASIS, a technical committee is able to approved and publish “Approved Errata” but these are restricted to changes that do not break conformance of existing implementations. Anything beyond that is considered a substantive change to the standard and requires review approval by the OASIS membership.

Clear enough? In fact, in many cases WG4 appears to follow this important distinction. Some of the proposed changes are simple and benign. For example, some BRM issues were fixed, but in being fixed caused informative example markup in the standard to be incorrect. A quick fix of these items via corrigenda is most welcome.

However, in other cases (in fact most of the cases), the Microsoft-dominated WG4 appears to have overstepped the permissible bounds for corrigenda, and indeed gone far, far beyond what it stated it would be doing in corrigenda. Let’s look at a few examples.

(Sadly, the general public is not given access to the text of the draft corrigenda (the DCOR) but those on the inside can follow along by reading N 1252 in the SC34 document repository.)

Let’s start by looking at items 16, 17, 36, 52, 53 and 133 in DCOR for ISO/IEC 29500 Part 4. These make changes and additions to the WordProcessingML schema. Deletions are noted in red strikethroughs, and additions in blue:

This is not correcting a drafting error. This is not correcting a publishing error. This is a substantial addition to the schema as you can see above.

It is argued, in the defect log, that this change is needed because, without it, ISO/IEC 29500 cannot represent change tracking in mathematical equations. However, this is exactly the type of change that WG4’s guidelines and JTC1 Directives exclude from corrigenda and place into amendments. The schema of OOXML is certainly an “agreed technical provision of an existing IS”. So how can adding math change tracking support to the schema be anything other than an “addition to previously agreed technical provisions”? And how can anyone in WG4 believe that adding dozens of lines to the schema can be done “without the theoretical possibility of breaking existing conformant implementations of the standard”? What about, for example, applications that were programmed to use the published OOXML schema, such as any application that uses a validating parser, or an schema-directed editor, or a program that generates code stubs from the schema, or does XML-to-relational DB mapping? Not only is there a theoretical possibility of breaking such applications, there is a theoretical certainty.

(Ironically, it should be noted that Microsoft was very keen to beat up on ODF for not having change tracking for mathematical equations, all while hiding the fact that OOXML lacked complete support for this feature as well.)

Another example, #122 in the DCOR.

It changes a type in chart, from a byte to an int and in doing so extends its allowed range considerably. How did anyone think that this was a change that was “without the theoretical possibility of breaking existing conformant implementations of the standard”? Isn’t there enough theoretical and practical expertise in WG4 to know that changes like this break compatibility?

For this change the rationale in the defect log explains the logic of it:

The standard states that the ST_Period simple type uses the XML Schema ST_Period data type and supports a range 2–255.

These observations are incompatible with existing documents and should be updated to reflect such prior art.

And so on and so on. If you search through the defect log, you will see the phrase “existing documents” used dozens of times. That appears to be how many discussions in WG4 end. It shuts down debate like an appeal to “national security” or “executive privilege”, arguments that trumps all others. It doesn’t matter what WG4 previously told SC34, or what JTC1 Directives say, if ISO/IEC 29500 does not match what Microsoft Office actually writes out, then this is by definition a drafting error, and the standard will be “corrected” to conform with MS Office. Let that sink in for a little, until you realize how backwards this is.

I invite you to go back to the defect log [PDF] and search for “BRM”. You will find several oddities. For example, among these proposed changes are some that actually reverse BRM decisions. Yes, you heard me correctly. SC34/WG4, the Microsoft-dominated committee that maintains OOXML, is undoing various BRM decisions that enabled OOXML to be approved in the first place. Why? Well, of course, to make the standard conform more to Microsoft Office.

For example, take DR 09-0159 “General: Unintended incompatibilities between Transitional schema and Ecma-376” or DR 09-0275 “BRM: serial date representation” with this comment:

Although this text is in accord with the detailed amendments resolved at the BRM, it is against the spirit of the desired changes for many countries. We believe that due to time limitations at the BRM, this change was made without sufficient examination of the consequences, and was made in error by the BRM (in which error the UK played a part).

(Norbert Bollow, a member of the Swiss NB, has some good analysis of the return of the leap year bug in spreadsheets. And Jomar Silva with the Brazilian NB tracks some additional breaking changes on a wiki.)

Ah, So WG4 is now interpreting the “spirit of the BRM” through their shamanic communion with the ISO Weltgeist, and each time their oracle come back with the same response: “Change OOXML so it ‘conforms’ to Microsoft Office 2007”. How convenient for Microsoft.

For most standards, multiple vendors work together to improve interoperability and to increase their conformance with the standard. But with OOXML a single vendor stuffs the committee and works to make the standard better conform to Microsoft’s monopoly product.

So although Microsoft Office does not conform to ISO/IEC 29500 today, I have no doubt that within a few months it will fully conform. But not a single line of code will have changed in the Office product. Office 2007 will be retroactively made to conform to ISO/IEC 29500. What will happen is the standard will be modified to match that single vendor’s products, by misapplication of an ISO procedure intended for fixing minor drafting errors.

So why go through all this trouble? I believe this is all about getting the OOXML standard “corrected” so Microsoft can push for it to get it officially adopted around the world. The only reason they’ve held back so far is because MS Office does not actually implement ISO/IEC 29500 today. So it would have been counter productive for them to push for official adoption. However, once this oversight is remedied, by changing the standard to match their product, then watch out.

The side effect, perhaps unintended, is that the OOXML standard is thus clearly marked to be unstable and unsuitable for adoption or implementation. With 800 pages of defects and more being found, and a Microsoft-dominated committee that changes the standard with no objective technical justification, the exact contents of the OOXML standard is tentative, uncertain and temporary. Four corrigenda documents and two amendment documents are currently being balloted, including many breaking changes. More corrigenda and amendments are on the way. There is no provision for a version attribute or any other indicator to declare which of the multiple incompatible versions of the standard a document conforms to. What competitor would risk implementing the standard, knowing that Microsoft dominates WG4, which has shown it is willing to change the standard at Microsoft’s whim? The risk is simply too large. A competitor would simply be putting their head in the lion’s mouth.

And at the same time WG4 rushes to make OOXML conform to Office 2007, Microsoft is moving on with Office 2010, now in technical preview. Office 2010 will be extending OOXML in hundreds of places. Where is SC34 in this? Where is the new work proposal for OOXML 1.1? Where the are discussions? The drafts? None of this exists. If Microsoft wanted to, they could have submitted these changes to SC34 at the recent meeting in Seattle, but they preferred to reserve discussion of the Office 2010 changes for a private meeting in Redmond the day after the SC34 Plenary ended, a snub to SC34 and their fictional control of OOXML.

So Microsoft is now off extending OOXML, and this whole ISO escapade with OOXML seems for naught. (I hear also that Microsoft is also backing off the submission of their Extensible Page Specification (XPS) to ISO as well, saying that “an Ecma Standard is good enough”.) It appears that Microsoft got what they wanted from ISO and is moving on. Who said it would last more than a night? As my grandmother used to say, “Why buy the cow when you can get the milk for free?”

In any case, the future looks like something like this:

ISO/IEC 29500:2008’s future is uncertain. If the whole i4i patent thing goes against Microsoft, the standard will probably need to be withdrawn.
ISO/IEC 29500 with Corrigenda and Amendments will eventually line up with Office 2007 SP2 sometime in 2010/11.
But before that happens, Office 2010 will ship with hundreds of extensions that are not described in ISO/IEC 29500 but are documented in proprietary “implementation notes” on Microsoft’s web site.
“Office 15” will ship sometime around 2013. It will have further proprietary extensions to ISO/IEC 29500, also not standardized. Office 15 will still be supporting “transitional” OOXML, just like Office 2007 and Office 2010 did. “Transitional” OOXML is the variation that has all the deprecated crud, like VML and “DoItLikeWord95” in it.
“Office 16” will ship sometime around 2016. It will finally support the “strict” schema of ISO/IEC 29500, but with 3 generations of proprietary extensions layered on top of it. And that assumes ISO/IEC 29500 actually still exists. In 2015 — 5 years after its last amendment — it will be up for “periodic review” in ISO and may be withdrawn if it appears to have been abandoned by Microsoft.

The pattern is clear: OOXML will be extended by Microsoft much faster than it will be standardized and corrected by ISO. This will make the ISO version of OOXML, currently not supported by Microsoft, even more irrelevant in the future.

The Final OOXML Update: Part II

2009/10/16 By Rob 12 Comments

In Part I of this OOXML update, my first post on the topic in over a year, I showed you how Microsoft maintains strong control over the OOXML standard. Despite their earlier promises that control of OOXML would be handed over to an independent, international committee, a look at attendance records reveals that the committee that maintains OOXML (JTC1/SC34/WG4) consists mainly of Microsoft employees, who outnumber any other company or organization on the committee 10-to-1.

In this, Part II of my OOXML update, I’ll tie up another loose end from the immediate aftermath of the DIS 29500 BRM.

Let’s start by casting our memories back to April, 2008. The BRM was over. NBs had reviewed thousands of pages and submitted thousands of defects. And Microsoft/Ecma made thousands of responses. And the BRM approved thousands of changes. Then, as a final step, NBs were asked if they wanted to change their vote from their original September 2007 vote, based on the changes made to the standard by the BRM.

Curiously, NBs were asked to make their final decision without actually seeing the text of the standard they were being asked to approve. ISO leadership denied requests from several NBs, a formal SC34 resolution requesting this text, as well as NB appeals, all which asked to have access to the “final DIS” text that would eventually be published. The ISO chief, in his response to the NB appeals, called the final text of OOXML “irrelevant” (prophetic words, indeed!) and would only permit NBs to have access to a list of over 1,000 resolutions from the BRM, many of which gave great editing discretion to the Microsoft consultant who would eventually produce the final text of the specification.

I discussed why the lack of a final DIS text was a problem back in May 2008:

We are currently approaching a two month period where NB’s can lodge an appeal against OOXML. Ordinarily, one of the grounds for appeal would be if the Project Editor did not faithfully carry out the editing instructions approved at the BRM. For example, if he failed to make approved changes, made changes that were not authorized, or introduced new errors when applying the approved changes. But with no final DIS text, the NB’s are unable to make any appeals on those grounds. By delaying the release of the final DIS text, JTC1 is preventing NB’s from exercising their rights.

Would you make thousands of changes to code and then not allow anyone to test it, and then release it internationally? Of course not. Doing so would amount to professional malpractice. But that is essentially what ISO did with OOXML.

Well, guess what happened? Indeed, the published text of OOXML failed to carry out all of the editing instructions made by the BRM. Several of the BRM resolutions were ignored altogether. Several others were applied inconsistently or erroneously. Although I am aware of no systematic review of all 1,000+ BRM decisions, some NBs have gone back and reviewed the published text against BRM decisions that should have addressed their own NB’s reported comments. They have found many “discrepancies” and these have now been reported as defect reports [PDF].

Whether the flaws in the published text are intentional or accidental, grave or minor, does not really matter here. Errare humanum est. The problem is that it was 100% predictable that human error would cause problems like this when dealing with text changes of this volume. The issue is not whether there will be errors introduced. The presence of many errors was guaranteed. The question was whether NBs are entitled to base their vote on all relevant information, including the final text of the standard, or whether relevant information, indeed the most relevant information — the text of the specification they are voting on — may be withheld from inspection. For ISO leadership to deny NBs the ability to review the very text they were voting was irresponsible.

The good news is that ISO leadership has changed since this debacle, and JTC1 is currently revising Fast Track procedures, in part to deal with the abuses of DIS 29500. One of the changes they are making is to the Fast Track procedure will require that the final DIS text be distributed to NBs before the final vote. This is progress and it is good to see these changes made, though it is unfortunate that it required a failure before such obvious and prudent precautions were instituted. Leadership entails foreseeing and preventing problems, not simply reacting to them. In any case, the NBs that appealed to ISO on the basis of not being allowed to see the final text should feel vindicated now. The NBs of India, Brazil and South Africa were right.

In Part III of this Update, I’ll bring the story up to the present day, and in Part IV I’ll update the story through the year 2016.

Protocols, Formats and the Limits of Disclosure

2009/10/12 By Rob 4 Comments

A few words today on an important distinction that deserves greater appreciation, since it lies at the heart of several current interoperability debates. What I relate here will be well-known to any engineer, though I think almost anyone can understand the gist of this.

First, let’s review the basics.

Formats define how information is encoded. For example, HTML is the standard format for describing web pages.

Protocols define how encoded information is transmitted from one endpoint to another. For example, HTTP is the standard protocol for downloading web pages from web servers to web browsers.

There are other such format/protocol pairs, such as MIME and SMTP for emails. When we talk about “web standards” we talk about formats (often described by W3C Recommendations) and protocols (often described in IETF RFCs).

An instance of data that conforms to a given format standard might be given any number of terms: a web page, a document, an image, a video, etc., according to the underlying standard. The instance of a format is a data, bits and bytes that you can save to your hard drive, burn to a CD, email, etc. Data in a format is persistent and has a static representation.

But what is an instance of a protocol? It is a transaction. It is ephemeral. You can’t easily save an instance of HTTP or SMTP on your hard drive, or email it to someone else. A protocol is complex dance, a set of queries and responses, often a negotiation of capabilities that preface the data transmission.

There is a key distinction between formats and protocols when it comes to interoperability. The key is that a protocol typically involves the negotiation of communication details between two identifiable parties, each of whom can state their capabilities and preferences, as well as conform to the capabilities of the datalink itself. Software running on each endpoint of the transaction can adapt as part of this negotiation.

You may be familiar with this from the modem days, where this “handshaking” procedure was audibly manifest to you whenever you connected to a remote host. But although you don’t hear or see it, this negotiation still occurs with protocols today, behind the scenes.

For example, when you request a web page, your client negotiates all sorts of parameters with the web server, including packet size and timings (at the TCP/IP level) to authentication, language, character set and cache preferences (at the HTTP level). This negotiation of capabilities is essential for handling the diversity of difference web servers and web clients in existence today.

With a protocol, you have two technical endpoints communicating and negotiating the parameters of the data exchange. In other words, you have software on both ends of the communication able to execute logic to adapt to the needs of the other endpoint and the capabilities of the underlying datalink.

However, when it comes to formats, things are different.

Let’s use an word processor document as an example of a format instance. I author a document, and then I send it out, via email, as an attachment on my blog, burned on a conference CD-ROM, posted to a document server or whatever. I have no idea who the party on the receiving end will be, nor what software they will be using. They could be running Microsoft Office, but they could also be using OpenOffice, Google Docs, Lotus Symphony, WordPerfect, AbiWord, KOffice, etc. I, as the document author, have no ability to target my document to the quirks of the receiving party, since their identity and capabilities are unknown and in general unknowable.

Since a document is not executable logic, it cannot adapt to the quirks of various endpoints. A document is static. When it comes time to interpret the document, you don’t see two vendor endpoints adapting and negotiating. You see only one piece of software, the receiving party’s application, and they need to interpret a static data instance in a given format.

In other words, with document formats, there is no dynamic negotiation, because at the time when your write a document out, you have no idea what the reading application will be. And although the application that reads the document may know the identity of the writing application (via metadata stored in the document for example), it has no ability to negotiate with the writing application, since that application is not present when the document is being loaded.

OK. Simple enough. However, a confused understanding this distinction will lead you to muddled reasoning about interoperability and how it is achieved.

Although it is not ideal, having Microsoft disclosure the details of exactly how they implement various proprietary protocols and even their quirky implementation of standard protocols, this may enable 3rd parties to code to these details. If the disclosure is timely, complete and accurate, this information may be useful. I think of the SAMBA work, for example.

However, no amount of disclosure from Microsoft on how they interpret the ODF standard will help. We see that today, with Office 2007 SP2, where it strips out ODF spreadsheet formulas. Having official documentation of this fact from Microsoft, in the form of “Implementation Notes” does not help interoperability. Why? Because when I create an ODF document, I do not know who the reader will be. It may be a Microsoft Office user. But maybe it won’t. It very well could be read by many different users, using many different programs. I cannot adapt my document to the quirks of all the various ODF implementations.

When you deal with formats, interoperability is achieved by converging on a common interpretation of the format. Having well-documented, but divergent interpretations does not improve interoperability. Disclosure of quirks is insufficient. Disclosure presumes a document exchange universe where the writing application knows that the reader will be Microsoft Office and only Microsoft Office and therefor the writer can adapt to Microsoft’s quirks. That is monopolist’s logic. Interoperability with competition only comes when all implementors converge in their interpretation of the format. When that happens we don’t need disclosures. We just follow the standard.