OOXML

Two Feet, No Feathers

2007/08/02 By Rob 20 Comments

We typically use words to communicate, to be understood. That is the common case, but not the only case. In some situations, words are used like metes and bounds to carefully circumscribe a concept by the use of language, in anticipation of another party attempting a breach. This is familiar in legislative and other legal contexts. Your concept is, “I want to lease my summer home and not get screwed,” and your attorney translates that into 20 pages of detailed conditions. You can be loose with your language, so long as your lawyer is not.

But even among professionals, the attack/defense of language continues. One party writes the tax code, and another party tries to find the loopholes. Iteration of this process leads to more complex tax codes and more complex tax shelters. The extreme verbosity (to a layperson) of legislation, patent claims or insurance policies results from centuries of cumulative knowledge which has taught the drafters of these instruments the importance of writing defensively. The language of your insurance policy is not there for your understanding. Its purpose is to be unassailable.

This “war of the words” has been going on for thousands of years. Plato, teaching in the Akademia grove, defined Man as “a biped, without feathers.” This was answered by the original smart-ass, Diogenes of Sinope, aka Diogenes the Cynic, who showed up shortly after with a plucked chicken, saying, “Here is Plato’s Man.” Plato’s definition was soon updated to include an additional restriction, “with broad, flat nails.” That is how the game is played.

In a similar way Microsoft has handed us all a plucked chicken in the form of OOXML, saying, “Here is your open standard.” We can, like Plato, all have a good laugh at what they gave us, but we should also make sure that we iterate on the definition of “open standard” to preserve the concept and the benefits that we intend. A plucked chicken does not magically become a man simply because it passes a loose definition. We do not need to accept it as such. It is still a plucked chicken.

(This reminds me of the story told of Abraham Lincoln, when asked, “How many legs does a dog have if you call the tail a leg?” Lincoln responded, “Four. Calling a tail a leg does not make it a leg.”)

With the recent announcement here in Massachusetts that the ETRM 4.0 reference architecture will include OOXML as an “open standard” we have another opportunity to look at the loopholes that current definitions allow, and ask ourselves whether these make sense.

The process for recommending a standard in ETRM 4.0 is defined by the following flowchart:

So, let’s go through the first three questions that presumably have already been asked and answered affirmatively in Massachusetts, to see if they conform to the facts as we know them.

Is the standard fully documented and publicly available? Can we really say that the standard is “fully documented” when the ISO review in the US and in other countries is turning up hundreds of problems that are pointing out that the standard is incomplete, inconsistent and even incorrect? We should not confuse length with information content. Just as a child can be overweight and malrnourished at the same time, a standard can be 6,000 pages long and still not be “fully documented.” Of course, we could just say, “A standard fully documents the provisions that it documents” and leave it at that. But such a tautological interpretation benefits no one in Massachusetts. We should consider the concept of enablement as we do when prosecuting patent applications. If a standard does not define a feature such that a “person having ordinary skill in the art” (PHOSITA) can “make and use” the technology described by the standard without “undue experimentation” then we cannot say that it is “fully documented.” By this definition, OOXML has huge gaps.
Is the standard developed and maintained in a process that is open, transparent and collaborative? We’re talking about Ecma here. How can their process be called transparent when they do not publicly list the names of their members or attendance at their meetings, do not have public archives of their meeting minutes, their discussion list or document archive, do not make publicly available their own spreadsheet of known flaws in the OOXML specification nor of the public comments they received during their public review period? How is this, by any definition, considered “transparent”? We can also question whether the process was open. When the charter constrains the committee from making changes that would be adverse to a single vendor’s interests, it really doesn’t matter what the composition of the committee is. The committee’s hands are already tied and should not be considered “open.” If I were writing a definition of an open, transparent process, I’d be sure to patch those two loopholes.
Is the standard developed, approved and maintained by a Standards Body? Without further qualifying “Standards Body” this is a toothless statement. As should be apparent right now, not all SDO’s are created equal. Some of the standards equivalent of diploma mills. Accreditation is the way we usually solve this kind of problem. Ecma’s Class A Liaison status with JTC1 is not an accreditation since their liaison status has no formal requirements other than expressing interesting in the technical agenda of JTC1. In comparison, OASIS needed to satisfy a detailed list of organizational, process, IPR and quality criteria before their acceptence as a PAS Submitter to JTC1/SC34. Why bother having a requirement for a Standards Body unless you have language that ensures that it is not a puppet without quality control?
Is there existing or growing industry support around the use of the standard? Again, very vague. A look at Google hits for OOXML documents shows that there are very few actually in use. My numbers show that only 1 in 10,000 new office documents are in OOXML format. But I guess that is more than 0 in 10,000 that existing last year. But is this really evidence for “growing industry support”? I’d change the language to require that there be several independent, substantially full implementations.

There are two additional questions which I won’t presume to answer since they rely more on integration with internal ITD processes.

We learn lessons and move on to the next battle. Just as GPLv2 required GPLv3 to patch perceived vulnerabilities, we’ll all have much work to do cleaning up after OOXML. Certainly JTC1 Directives around Fast Tracks will need to be gutted and rewritten. Also, the vague and contradictory ballot rules in JTC1, and the non-existent Ballot Resolution Meeting procedures will need to be addressed. I suggest that ITD take another look at their flowchart as well, and try to figure out how they can avoid getting another plucked chicken in the future.

My comments on the ETRM 4.0 draft

2007/07/29 By Rob 12 Comments

This was my response to the call for public comments on the Information Technology Division’s (ITD) Enterprise Technical Reference Model (ETRM) 4.0 draft.

I’d like to write to you as a long-time Massachusetts resident and taxpayer. My employer (IBM) will likely submit their own comments, but I’d like to offer you my own personal views on the ETRM 4.0 draft.

I am proud of the Commonwealth’s tradition of openness in government, enshrined in our Public Records Law and Open Meeting Law. As James Madison wrote, “A popular government, without popular information, or the means of acquiring it, is but a prologue to a farce or a tragedy. A people who mean to be their own governors must arm themselves with the power which knowledge gives them.” So access to government documents, now and for posterity, is critical for public oversight and participation in government, as well as for preserving our heritage. Now that we’ve moved into the digital age, access to government documents requires that these documents be made available in a format that all Commonwealth residents can read. So the move toward open documents formats, as called for in the ETRM, is laudable. A citizen must never be dependent on any single vendor for the software needed to read their government’s documents.

However, I am concerned at the proposed addition of Ecma Office Open XML (OOXML) to the list of acceptable document formats. As you may have heard, OOXML is currently undergoing review by ISO/IEC JTC1 for possible approval as an ISO standard. As part of this review, technical committees in standards bodies around the world are reviewing OOXML and appraising it’s suitability as an International Standard. As a participant in the US committee reviewing OOXML, INCITS V1, I had the opportunity to review the text of the OOXML specification and to discuss it with others. I am sorry to report that I found the OOXML specification to be full of errors and omissions. Of course, no technical document is perfect. But this one, in particular, is of far greater length (more than 6,000 pages) and of far lower quality than any I have seen before. If it has advanced this far in the ISO process it is because of vendor pressure, not because of technical merit.

What is the problem with a buggy standard? Interoperability suffers. That is the problem. There is no doubt that if everyone in the Commonwealth used Microsoft Office 2007 on Windows Vista, that their interoperability will be good. But as soon as we admit choice in applications and operating systems, then interoperability will only occur when all sides follow a common standard. So the technical quality of a standard (accuracy, comprehensiveness, level of detail, consistency, etc.) is directly proportional to the level of interoperability achievable and the cost to achieve it.

The ISO ballot on OOXML will not end until September 2nd, after which a resolution process to fix defects in the text of the standard will take at least an additional 6-18 months. That is, of course, if OOXML gains ISO approval, something which is not certain at this point. So I would recommend a cautious approach, and wait for the ISO process to conclude, or conduct your own independent technical evaluation of the OOXML specification to confirm its technical quality before adding OOXML to your list. Ask other vendors: Is this something you can implement? Ask yourself: Will this truly give the Commonwealth the interoperability and choice that you desire? These are important questions to ask.

Finally, I’d note that the ETRM also calls out OpenDocument Format (ODF) as an acceptable format. ODF was approved by ISO last year. So why do we need OOXML? I personally think that the complexity of document exchange and translation in a multi-format world would take us back to the confusion and frustration of the early 1990’s when we all juggled WordStar, WordPerfect, Word and WordPro files, and could collaborate only poorly. Better to push for a single unified/harmonized standard document format for personal productivity applications, much as we have a single standard (HTML) for web pages.

I’ll leave you with a quote from Tim Berners-Lee, the inventor of the web, from an interview he gave with David Berlind from ZDNet when Berners-Lee was recently in Boston receiving a Lifetime Achievement Award from the Massachusetts Innovation & Technology Exchange.

Berners-Lee said:

It was the standardization around HTML that allowed the web to take off. It was not only the fact that it is standard, but the fact that it’s open and the fact that it is royalty-free.

So what we saw on top of the web was a huge diversity and different business which are built on top of the web given that it is an open platform.

If HTML had not been free, if it had been proprietary technology, then there would have been the business of actually selling HTML and the competing JTML, LTML, MTML products. Because we wouldn’t have had the open platform, we would have had competition for these various different browser platforms, but we wouldn’t have had the web. We wouldn’t have had everything growing on top of it.

So I think it very important that as we move on to new spaces … we must keep the same openness we that had before. We must keep an open internet platform, keep the standards for the presentation languages common and royalty free. So that means, yes, we need standards, because the money, the excitement is not competing over the technology at that level. The excitement is in the businesses and the applications that you built on top of the web platform.

I believe we want to ensure the same qualities in document formats. We want competition and choice among vendors, applications and services, but not among standards. If we compete on standards, then no one wins.

Competition Optional

2007/07/29 By Rob 4 Comments

Regular readers will recall previous posts where I wrote about the abuse of language in the file format debate, including individual posts on the words choice, representation, compatibility and interoperability. Now it is time to slay another dragon, the word “optional.”

The OED defines optional as, “That is a matter of choice; depending on choice or preference; that may be done or left undone according to one’s will or pleasure.”

So volition plays a role. The matter that is optional is done or not done according to someone’s will. Someone has a choice. It is important to inquire who this person is. If someone says that “torture is optional,” we will be left with a rather imperfect knowledge of the circumstances unless we are also told whether the torture is at the option of the jailer, or of the prisoner.

More on the volition question a bit later.

In previous posts I have pointed out numerous “features” in OOXML which cannot be implemented by anyone else but Microsoft. These stem from a variety of causes, including elements lacking definition (“lineWrapLikeWord6”) to features that are tied to Windows or Office (e.g., Windows Metafiles) to items that are “merely referenced (OLE, digital ink) to items that although featured prominently in Office marketing materials, are curiously not mentioned at all in the OOXML text (scripts, macros, DRM, SharePoint, etc.). When these issues are raised, the typical response from Microsoft has been along the lines of, “Don’t worry, these features are optional. You don’t need to implement them. They are there for implementations that know what they mean. If you don’t understand them, you can ignore them.”

Let’s drill into this argument a bit more.

In the case of a person using an OOXML-supporting word processor, there are at least five levels of “optional” to consider:

At level of the XML, what constraints does the XML schema define? What elements and attributes are required, which are optional?
At the level of the OOXML conformance clause, what in the standard is optional and what is mandatory?
From the application vendor’s perspective, what features must be implemented in order to have a commercially viable product?
From the document author’s perspective, what features must a competitive word processor have when creating an OOXML document?
From the reader’s perspective, what level of features must a competitive word processor have to give a high fidelity presentation when reading an OOXML document?

It should be noted that 1 and 2 are closely related, and that 3, 4 and 5 are closely related. However, (1,2) and (3,4,5) have no necessary relationship with each other. It is entirely possible for a standard to define a schema and a conformance clause that describe a commercially inadequate set of features.

So what is required in OOXML, according to the standard? What must a word processor support if it wants to claim that it is conformant? The answer is given in OOXML’s conformance clause (Part I, Section 2.5 “Application Conformance”):

A conforming consumer shall not reject any conforming documents of the document type expected by that application.

That’s it. Conformance is defined in the negative: A conformant word processor must not give an error message when it is presented with a valid DOCX file. It does not need to do anything in particular with that document, but it can’t reject it. There are no mandated features, no required functionality. Under the provided conformance clause a conformant OOXML consumer can be as simple as the DOS command:

del *.docx

This is fully conformant since the delete command does not reject conforming documents.

So I find it highly disingenuous for Microsoft to argue that portions of the spec are permissibly undefined, vague or even incorrect on the grounds that they are optional. Everything in OOXML is optional. This should be repeated until it sinks in. Everything in OOXML is optional. So the argument that flaws are allowable in optional features can be used to allow any and all errors. But in fact there is nothing in ISO Directives or practice that excuses optional features from being fully or correctly defined. A feature being optional is a statement about conformance, not a license for reduced specification quality.

Back to the earlier dictionary definition of “optional”, where we inquired about whose volition was being expressed. In the case of features which are optional based on the OOXML conformance clause, whose volition is being expressed? The end user? No. When the end user receives a document, they expect it to be rendered according to the author’s intent. The author of the document has no idea when they are using optional features of OOXML. And if the document was translated from a legacy binary document, it may contain, unknown to that user, VML and various compatibility flags. If this document is edited and then forwarded to another user, that 2nd user also has no idea what markup is within the DOCX file. But they do have the expectation that their document will open and render properly. Despite Microsoft’s assurance that this is not a problem, since these features are optional, our poor little users may beg to differ.

What about the application vendor, is their volition expressed when OOXML says a feature is optional? Not at all. If a vendor wishes to create a commercially viable, competitive word processor, they must support the features that their customers’ documents contain, regardless of where these documents originated. Whether Microsoft calls these features “optional” or even “out of scope”, the fact remains that a vendor who does not implement them will be at a competitive disadvantage, and a “standard” that refuses to document fully these features will ensure and perpetuate this disadvantage.

From the application vendor’s perspective, it is an abuse of language and logic to call something “optional” if it is not fully documented. If it is not documented, then a vendor cannot implement it even if they wanted to. There is nothing optional about it, since there is no way to opt-in. Furthermore, if a vendor did manage to provide their own understanding of such a feature, via reverse engineering, or via other documentation not included in the standard, they would not be covered by Microsoft’s patent covenant, since that applies only to items that are “described in detail” in the standard.

When the OOXML standard says something is optional, it is speaking from the perspective of the author of the standard, Microsoft. Although it may fit well with their business strategy for would-be competitors to lack the technical information necessary to create competitive products, I do not find that this is a proper object for a would-be ISO standard.

ISO defines a standard as:

[A] document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context. (ISO/IEC Guide 2:2004, definition 3.2)

A key phrase is, “for common and repeated use.” This means that a standard must defines things that can be practiced in a repeatable manner by others. For a standard to include items that cannot be practiced by others, purely due to lack of a coherent definition, is antithetical to the purpose of a standard.

The next time that Microsoft asserts that these flaws are acceptable, because the flawed feature is “optional” we should ask them whether they believe also that competition is optional, since that is the net effect. To claim that the OOXML standard is the key to backwards compatibility with legacy documents while at the same time failing to document the very features that enable this legacy support — this is a sham. To justify this by arguing that such features are “optional” is neither good logic, good engineering, nor good standards development.

Stranger than Fiction

2007/07/20 By Rob 13 Comments

This is taken from slide 21 of Ecma’s “Standards @ Internet Speed” presentation (22 June 2007 version) which you can download here. It appears to be a sales pitch.

I’ve joked about the Ecma process before, but I never thought I’d see it written out officially like this. Standards are made available “on time”? Minimize the “risk” of changes? I thought the whole purpose of technical review was to find the problems and fix them? As always, the man who pays the piper calls the tune.

(Also, you gotta love the “wink and nod” approach to patents on slide 16, where they can pretend a problem doesn’t exist if one member disputes that a patent is essential to implement the standard.)

The Cookbook

2007/07/19 By Rob 18 Comments

The Thimbleberry Inn’s executive chef, Guillaume Portes, was sensational. He took the modest kitchen of this even more modest inn, and using local produce and game, and with a flair for the dramatic, created a menu that drew local, regional and even national attention, in ever widening spirals of epicurean and gastronomic success.

Now, fresh back from a guest appearance on a cable cooking show, Guillaume received a call from a publisher, asking if he would be interested in writing a cookbook: “Everyone loves your food. You’re a genius. If you write a cookbook we could sell millions.”

Guillaume at first was skeptical, “Me, write a cookbook? But I know nothing of writing!”

“Don’t worry,” said the publisher. “I’ll set you up with our best editor, Frank Morris. Frank knows writing. He does all of our celebrity books. It will be a wonderful collaboration.”

A few months later and the cookbook was almost ready to publish. The review copies went out in expectation of rave flap blurbs. But what came back…well…it wasn’t quite what they had expected:

Complaints that the Inn’s most popular dishes were not included. “Why did you leave out your most popular dish, the Pecan Stuffed Pheasant?”
Reports that some recipes were missing steps in their instructions, or that the specified ingredient amounts were vague, missing or incorrect. “One spoon of salt? It would help if you were more specific. Tablespoon or teaspoon?”
Some recipes had ingredients listed that did not seem to be used in the recipe. “What do I use the scallions for? They are listed as ingredients, but nothing explains how or when they are used.”
Observations that some instruction steps were vague or relied on unusual sources of information. “How are we to interpret a recipe step that says ‘Cook it like Aunt Mable used to cook it’ !”
Recipes were missing sauces or said simply, “Add your own sauce.” But the Thimbleberry Inn was famous for their sauces. How could any one replicate their dishes if the recipe for their signature thimbleberry sauce was omitted?
A huge number of typographical errors, broken references, inconsistencies, etc., that showed that the preparation of the cookbook was hasty and lacked sufficient review.

The publisher was confused. The editor was aghast. Guillaume was furious. How could the reviewers do this to us? Back stabbers! How dare they! These dishes are the finest in the country. Everyone who comes to the Inn loves and praises them. Look at the restaurant reviews! Look at my prize medals! Surely the reviewers must be in league with my competitor, the evil Gooseberry Inn, and are merely trying to prevent my book from selling!

The greatest dose of abuse was reserved for those reviewers who reported the greatest number of problems.

As the imprecations grew more passionate, and the volume and temperature rose, a timid voice arose from the back of the room, saying, “Uh, but are they right?”

The room grew silent as Fred Osgood, an old-timer, once editor-in-chief, but now on the verge of retirement, spoke his mind:

I’ve been in this business quite a while, as you all know. I’ve edited cookbooks before, plenty of them. None for a client as big as the Thimbleberry Inn, but the ones I did edit received good reviews and were modest successes in their day.

Frank, I don’t think you’ve ever edited a cookbook in your life, have you? I didn’t think so. The thing you need to know is that cookbooks require careful technical review as well as standard editing. Just because a chef is talented, or a restaurant is popular does not guarantee that the recipes and the cookbook are good. A recipe is not a dish. There is a lot of time and hard work required to turn Guillaume’s natural genius in the kitchen into something that readers of our cookbook can replicate in their own kitchens.

Back when I edited cookbooks, I made sure that we set up a test kitchen and verified every step of every recipe. We cooked, revised, and cooked again until we could say that every recipe worked as written.

I have no doubt that Guillaume can cook every one of these dishes from memory and get it right every time. I don’t think any of us can question that. But that is not the important question. The cookbook is not for Guillaume’s benefit. The question we need to ask is whether our readers can cook these dishes using these recipes. In other words, are the recipes relevant, complete and accurate? This is what makes writing a cookbook different than cooking, and this is where we have failed our readers. The reputation of the Thimbleberry Inn as well as this publishing house depends on us doing this right. We need to send this cookbook back for full and proper technical review.

The room was silent for a minute, as the others gazed at their feet. Then a smirk came over Guillaume’s face and he struck his fist loudly on the table, stood up and said:

But I need this cookbook now! We must do a better job at finding good reviewers. Let’s throw out the reviews we have so far. Let me give you the names of some of my friends, partners and former colleagues. Don’t even bother sending them a review copy. They don’t need to read it. They’ve all eaten at the Inn. They know how good my food is. Just have them fax over their reviews. Let’s get moving, gentlemen! The 2nd edition of the Gooseberry Inn’s cookbook is due out any month now. We can’t be left behind!

Old Fred silently collected his things and left via the back door, muttering.