Tuesday, July 31, 2007

One Year and One Hundred Posts Later...

I'm not one for excessive self-reflection. Like the Heresiarchs of Uqbar, I think that mirrors are abominable. However, since I have simultaneously reached my 100th blog post as well as my one year anniversary with An Antic Disposition, I feel that an inward glance is both appropriate and timely.

Only 100 posts in a year? I remain in awe of other bloggers who manage to put out an order of magnitude more material, sometimes several posts in one day. Writing is not easy for me. Although it may not always show, I agonize over every word. I aim for clarity, euphony, a smart rhythm and a bit of wit.

Clarity is difficult, since my readers come from a wide range of technical backgrounds, so some posts are high-level, simplified descriptions, while others dive into the bowels of the beast. But of course, clarity is no excuse for not being understood. As Gertrude Stein wrote:

Clarity is of no importance because nobody listens and nobody knows what you mean no matter what you mean, nor how clearly you mean what you mean. But if you have vitality enough of knowing enough of what you mean, somebody and sometime and sometimes a great many will have to realize that you know what you mean and so they will agree that you mean what you know, what you know you mean, which is as near as anybody can come to understanding anyone.

Since I started blogging on document format issues last July, here are the basic stats on the blog, once I subtract out other parts of this website, like my weather observations and family tree pages:


The traffic has been steadily increasing over the last 12 months, so I'm actually averaging closer to 3,000 visits/day today.

I've been Slashdot'ed a few times and featured on GrokLaw more times than I will ever be able to thank. Such days can drive traffic up to 25,000 visitors.

Technorati shows 787 links to the blog, which is pretty good. It gives me a Google PageRank of 7 which has some humorous implications as we'll discuss later.

Most popular posts by hits:

  1. How to Hire Guillaume Portes (71,152 hits, 3 January 2007) The intent here was to create a fictional name, which roughly translated "Bill Gates" into French. However I later found out this is the real-world name of a game programmer in the UK. I hope he took this with good humor (or even humour). The post dealt with how overspecification can hurt a standard. This post tipped people off to the weird compatibility flags in OOXML that tie it to undefined legacy behaviors in Word 95, etc. The line, "This is not a specification; this is a DNA sequence" was a spontaneous insight I came up with in response to a question from the audience at the KDE aKademy 2006 conference in Dublin the previous October.
  2. OOXML Fails to Gain Approval in US (48,802 hits, 15 July 2007) This was a report on the INCITS V1 OOXML vote. It became widely quoted, very quickly. I think this was partially because was in a straightforward, factual style of reportage, without overt color or opinion. My working title was "US Technical Committee Fails to Approve OOXML," but that caused the title to wrap to two lines, which I try to avoid.
  3. The Formula for Failure (37,648 hits, 9 July 2007) OOXML's spreadsheet formula specification is full of mathematical errors. How was this not detected earlier by Ecma? What does this say about the sufficiency of the Ecma review process?
  4. A Leap Back (20,270 hits, 12 October 2006) A look at the history of the Gregorian Calendar, and how OOXML gets it wrong. Microsoft says it was done for "legacy reasons," which is another way of saying it is a bug that they don't want to fix.
  5. Math Markup Marked Down (21,358 hits, 25 April 2007) This post told how Nature and Science journals were rejecting submissions in OOXML format.
  6. The Chernobyl Design Pattern (21,079 hits, 26 October 2006) This one was never widely quoted, but continues to receive sustained traffic from StumbleUpon.
  7. A game of Zendo (9,344 hits, 18 July 2006) This post lacks focus, seemingly trying to discuss Zendo, backwards compatibility as well as Word art borders. The technical points are sound, but I think the post lacks cohesion.
  8. The OOXML Compatibility Pack (8,067 hits, 6 September 2006) This was an early post on the topic, but the later Interoperability by Design post covered it better, I think.
  9. File Format Timeline (9,920 hits, 24 June 2007) I first posted it as just a PNG graphic, with no HTML text. I received no links. It is hard to quote something that has no text. So I added some text and received links and a lot more traffic. A good lesson to remember: A picture is worth a thousand words, but if you don't have any text, no one can quote you.
  10. More Matter with Less Art (8,730 hits, 31 January 2007) This is a long, rambling response to critics of How to Hire Guillaume Portes. I'm reminded that the old saying "It is impossible to make something foolproof because fools are so ingenious" applies to arguments as well.

My personal favorite posts, in no particular order:

  1. How to Write a Standard (If you must) A look at how Microsoft and Ecma are making a travesty of standards development. I originally wrote this post as a straightforward analysis, but it was ponderous. Then I rewrote in the form of an antipattern, but it still lacked crispness. Then I had the key insight -- If I simply state their argument explicitly, it works as a satire.
  2. How Standards Bring Consumers Choice This was written for a general audience who knew nothing about OOXML or document formats. I had a lot of fun reading up on the various electrical standards.
  3. A Tale of Two Formats One of the problems that I perceive is that we are not dreaming big enough when it comes to the future of office applications. Many seem satisfied with simply being a mini-Office or following after Microsoft's technologies at a delay of a few years. But I think we need a more radical re-imagining of what office productivity applications are all about. What we have today is determined by the dead hand of a monopolist leading us in conventional circles, unable to innovate because of the grip of their own installed base. Are we ready for some real innovation? Or are we happy with 15 more years of paying for upgrades and only getting dancing paperclips?
  4. File Format Timeline I first posted it as just a PNG graphic, with no HTML text. I received no links. It is hard to quote something that has no text. So I added some text and received links and a lot more traffic. A good lesson to remember: A picture is worth a thousand words, but if you don't have any text, no one can quote you.
  5. The Legend of the Rat Farmer Another parable, this time to refute the specious argument that more standards improves interoperability.
  6. Pruning Raspberries Zero comments, zero links. Sometimes I write for an audience of one, and that is fine.
  7. The Cookbook Another parable. Why parables? For over 2000 years (e.g., Christ, Socrates and Confucius) story telling has been an important rhetorical device. The point is not that a story is the easiest way to explain something. On the contrary, it is much harder. But a story is one of the best ways to explain something if you want it to be remembered. Another good technique is to express the argument in song lyrics with a catchy tune, but I promise you I will not go down that road.
  8. The Case for a Single Document Format (in 4 parts, unfinished) This one is stretching the bounds of what I can do in a blog, due to length. I still need to finish part 4, and in the end I might just redo this as a paper rather than these too-long blog posts. But the material gives a good multi-disciplinary look at the question of standards and tries to answer the question, "Why do some technologies have a single standard, while others thrive with multiple standards?" We must acknowledge that both occur, but we must also acknowledge that it is important to know whether this is random, or whether a single standard regime is the natural and indeed the desired outcome under some conditions.
  9. Essential and Accidental in Standards Yes, it rambles, and takes a long time to make a simple point, but I think it is an interesting trip. A simpler version of the same basic argument (the theme of a sweet spot for technology) has been covered more succinctly (and perhaps more convincingly) by Tim Bray.
  10. The Parable of the Solipsistic Standard Another story, but I think this one went over almost everyone's head. Solipsism is the ultimate philosophically reduction of the Not Invented Here (NIH) Syndrome. Mixing epistemology with linguistics and standards and satire is asking for trouble. I think I got what I deserved here. But it was fun and some readers enjoyed it.

Top counties based on number of visits:

  1. USA
  2. Germany
  3. United Kingdom
  4. Netherlands
  5. Australia
  6. Canada
  7. Denmark
  8. China
  9. France
  10. Spain
  11. Italy
  12. Slovakia
  13. Austria
  14. Poland
  15. Belgium

Most active states based on number of visits:

  1. California
  2. Nevada
  3. Washington DC
  4. Colorado
  5. Pennsylvania
  6. New York
  7. Washington
  8. New Jersey
  9. Virginia
  10. Ohio

Most Active Cities based on number of visits:

  1. Beijing, China
  2. Mountain View, California
  3. Carson City, Nevada
  4. Washington DC
  5. Denver, Colorado
  6. Kuala Lumpur, Malaysia
  7. London, UK
  8. Gliwice, Poland
  9. Chester, Pennsylvannia
  10. Malchow, Germany
  11. NYC, New York
  12. Dublin, Ireland
  13. Bellevue, Washington
  14. Auckland, New Zealand
  15. West Sacramento, California
So my question is: who is in Gliwice, Poland? I didn't know I had so many readers from there. Ditto from Carson City, Nevada.

Top search phrases that lead people to this web site:
  1. rob weir
  2. traduttore
  3. jingle bells batman smells
  4. antic disposition
  5. cum
  6. cannibalism
  7. jingle bells batman smells lyrics
  8. ooxml
  9. rob weir blog
  10. jingle bells santa smells
Around 30% of the traffic is directed from search engines. I have observed the danger of having a high PageRank web site. Whenever I use an odd word in a post, this blog automatically becomes one of the top hits for people querying on that term. So a post from last July called, Cum mortuis in lingua mortua generates many search referrals from those who are merely looking, I presume, for more information regarding the Latin conjunction "cum" meaning "with." I hope they found what they were looking for.

Similarly, an old blog post talking about transmission of culture among children mentioned the "Jingle Bells/Batman Smells" parody. This gets many hits, especially in December. Although I have no particular expertise in Latin conjunctions or Christmas carol parodies I am an instant "authority" on these subjects (according to Google at least) because of this blog's ranking.

Browsers:
  1. 38% Firefox
  2. 7% I.E. 6.x
  3. 6% I.E. 7.x
  4. 2% Opera
  5. 1% Safari
  6. 1% Konqueror
So some good strength being shown by Firefox.


OS's:

  1. 35% Windows XP
  2. 30% Other
  3. 19% Linux
  4. 5% Mac OS
  5. 4% Vista
  6. 3% Windows 2000
  7. 2% Windows NT
  8. 1% Sun OS

Aggregation feeds:
Thanks, everyone, for reading!

-Rob

Labels:

Sunday, July 29, 2007

My comments on the ETRM 4.0 draft

This was my response to the call for public comments on the Information Technology Division's (ITD) Enterprise Technical Reference Model (ETRM) 4.0 draft.




I’d like to write to you as a long-time Massachusetts resident and taxpayer. My employer (IBM) will likely submit their own comments, but I’d like to offer you my own personal views on the ETRM 4.0 draft.

I am proud of the Commonwealth’s tradition of openness in government, enshrined in our Public Records Law and Open Meeting Law. As James Madison wrote, “A popular government, without popular information, or the means of acquiring it, is but a prologue to a farce or a tragedy. A people who mean to be their own governors must arm themselves with the power which knowledge gives them.” So access to government documents, now and for posterity, is critical for public oversight and participation in government, as well as for preserving our heritage. Now that we’ve moved into the digital age, access to government documents requires that these documents be made available in a format that all Commonwealth residents can read. So the move toward open documents formats, as called for in the ETRM, is laudable. A citizen must never be dependent on any single vendor for the software needed to read their government’s documents.

However, I am concerned at the proposed addition of Ecma Office Open XML (OOXML) to the list of acceptable document formats. As you may have heard, OOXML is currently undergoing review by ISO/IEC JTC1 for possible approval as an ISO standard. As part of this review, technical committees in standards bodies around the world are reviewing OOXML and appraising it’s suitability as an International Standard. As a participant in the US committee reviewing OOXML, INCITS V1, I had the opportunity to review the text of the OOXML specification and to discuss it with others. I am sorry to report that I found the OOXML specification to be full of errors and omissions. Of course, no technical document is perfect. But this one, in particular, is of far greater length (more than 6,000 pages) and of far lower quality than any I have seen before. If it has advanced this far in the ISO process it is because of vendor pressure, not because of technical merit.

What is the problem with a buggy standard? Interoperability suffers. That is the problem. There is no doubt that if everyone in the Commonwealth used Microsoft Office 2007 on Windows Vista, that their interoperability will be good. But as soon as we admit choice in applications and operating systems, then interoperability will only occur when all sides follow a common standard. So the technical quality of a standard (accuracy, comprehensiveness, level of detail, consistency, etc.) is directly proportional to the level of interoperability achievable and the cost to achieve it.

The ISO ballot on OOXML will not end until September 2nd, after which a resolution process to fix defects in the text of the standard will take at least an additional 6-18 months. That is, of course, if OOXML gains ISO approval, something which is not certain at this point. So I would recommend a cautious approach, and wait for the ISO process to conclude, or conduct your own independent technical evaluation of the OOXML specification to confirm its technical quality before adding OOXML to your list. Ask other vendors: Is this something you can implement? Ask yourself: Will this truly give the Commonwealth the interoperability and choice that you desire? These are important questions to ask.

Finally, I’d note that the ETRM also calls out OpenDocument Format (ODF) as an acceptable format. ODF was approved by ISO last year. So why do we need OOXML? I personally think that the complexity of document exchange and translation in a multi-format world would take us back to the confusion and frustration of the early 1990’s when we all juggled WordStar, WordPerfect, Word and WordPro files, and could collaborate only poorly. Better to push for a single unified/harmonized standard document format for personal productivity applications, much as we have a single standard (HTML) for web pages.

I’ll leave you with a quote from Tim Berners-Lee, the inventor of the web, from an interview he gave with David Berlind from ZDNet when Berners-Lee was recently in Boston receiving a Lifetime Achievement Award from the Massachusetts Innovation & Technology Exchange.

Berners-Lee said:

It was the standardization around HTML that allowed the web to take off. It was not only the fact that it is standard, but the fact that it’s open and the fact that it is royalty-free.

So what we saw on top of the web was a huge diversity and different business which are built on top of the web given that it is an open platform.

If HTML had not been free, if it had been proprietary technology, then there would have been the business of actually selling HTML and the competing JTML, LTML, MTML products. Because we wouldn’t have had the open platform, we would have had competition for these various different browser platforms, but we wouldn't have had the web. We wouldn't have had everything growing on top of it.

So I think it very important that as we move on to new spaces ... we must keep the same openness we that had before. We must keep an open internet platform, keep the standards for the presentation languages common and royalty free. So that means, yes, we need standards, because the money, the excitement is not competing over the technology at that level. The excitement is in the businesses and the applications that you built on top of the web platform.



I believe we want to ensure the same qualities in document formats. We want competition and choice among vendors, applications and services, but not among standards. If we compete on standards, then no one wins.

Labels: , ,

Competition Optional

Regular readers will recall previous posts where I wrote about the abuse of language in the file format debate, including individual posts on the words choice, representation, compatibility and interoperability. Now it is time to slay another dragon, the word "optional."

The OED defines optional as, "That is a matter of choice; depending on choice or preference; that may be done or left undone according to one's will or pleasure."

So volition plays a role. The matter that is optional is done or not done according to someone's will. Someone has a choice. It is important to inquire who this person is. If someone says that "torture is optional," we will be left with a rather imperfect knowledge of the circumstances unless we are also told whether the torture is at the option of the jailer, or of the prisoner.

More on the volition question a bit later.

In previous posts I have pointed out numerous "features" in OOXML which cannot be implemented by anyone else but Microsoft. These stem from a variety of causes, including elements lacking definition ("lineWrapLikeWord6") to features that are tied to Windows or Office (e.g., Windows Metafiles) to items that are "merely referenced (OLE, digital ink) to items that although featured prominently in Office marketing materials, are curiously not mentioned at all in the OOXML text (scripts, macros, DRM, SharePoint, etc.). When these issues are raised, the typical response from Microsoft has been along the lines of, "Don't worry, these features are optional. You don't need to implement them. They are there for implementations that know what they mean. If you don't understand them, you can ignore them."

Let's drill into this argument a bit more.

In the case of a person using an OOXML-supporting word processor, there are at least five levels of "optional" to consider:
  1. At level of the XML, what constraints does the XML schema define? What elements and attributes are required, which are optional?
  2. At the level of the OOXML conformance clause, what in the standard is optional and what is mandatory?
  3. From the application vendor's perspective, what features must be implemented in order to have a commercially viable product?
  4. From the document author's perspective, what features must a competitive word processor have when creating an OOXML document?
  5. From the reader's perspective, what level of features must a competitive word processor have to give a high fidelity presentation when reading an OOXML document?
It should be noted that 1 and 2 are closely related, and that 3, 4 and 5 are closely related. However, (1,2) and (3,4,5) have no necessary relationship with each other. It is entirely possible for a standard to define a schema and a conformance clause that describe a commercially inadequate set of features.

So what is required in OOXML, according to the standard? What must a word processor support if it wants to claim that it is conformant? The answer is given in OOXML's conformance clause (Part I, Section 2.5 "Application Conformance"):

A conforming consumer shall not reject any conforming documents of the document type expected by that application.

That's it. Conformance is defined in the negative: A conformant word processor must not give an error message when it is presented with a valid DOCX file. It does not need to do anything in particular with that document, but it can't reject it. There are no mandated features, no required functionality. Under the provided conformance clause a conformant OOXML consumer can be as simple as the DOS command:

del *.docx

This is fully conformant since the delete command does not reject conforming documents.

So I find it highly disingenuous for Microsoft to argue that portions of the spec are permissibly undefined, vague or even incorrect on the grounds that they are optional. Everything in OOXML is optional. This should be repeated until it sinks in. Everything in OOXML is optional. So the argument that flaws are allowable in optional features can be used to allow any and all errors. But in fact there is nothing in ISO Directives or practice that excuses optional features from being fully or correctly defined. A feature being optional is a statement about conformance, not a license for reduced specification quality.

Back to the earlier dictionary definition of "optional", where we inquired about whose volition was being expressed. In the case of features which are optional based on the OOXML conformance clause, whose volition is being expressed? The end user? No. When the end user receives a document, they expect it to be rendered according to the author's intent. The author of the document has no idea when they are using optional features of OOXML. And if the document was translated from a legacy binary document, it may contain, unknown to that user, VML and various compatibility flags. If this document is edited and then forwarded to another user, that 2nd user also has no idea what markup is within the DOCX file. But they do have the expectation that their document will open and render properly. Despite Microsoft's assurance that this is not a problem, since these features are optional, our poor little users may beg to differ.

What about the application vendor, is their volition expressed when OOXML says a feature is optional? Not at all. If a vendor wishes to create a commercially viable, competitive word processor, they must support the features that their customers' documents contain, regardless of where these documents originated. Whether Microsoft calls these features "optional" or even "out of scope", the fact remains that a vendor who does not implement them will be at a competitive disadvantage, and a "standard" that refuses to document fully these features will ensure and perpetuate this disadvantage.

From the application vendor's perspective, it is an abuse of language and logic to call something "optional" if it is not fully documented. If it is not documented, then a vendor cannot implement it even if they wanted to. There is nothing optional about it, since there is no way to opt-in. Furthermore, if a vendor did manage to provide their own understanding of such a feature, via reverse engineering, or via other documentation not included in the standard, they would not be covered by Microsoft's patent covenant, since that applies only to items that are "described in detail" in the standard.

When the OOXML standard says something is optional, it is speaking from the perspective of the author of the standard, Microsoft. Although it may fit well with their business strategy for would-be competitors to lack the technical information necessary to create competitive products, I do not find that this is a proper object for a would-be ISO standard.

ISO defines a standard as:

[A] document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context. (ISO/IEC Guide 2:2004, definition 3.2)

A key phrase is, "for common and repeated use." This means that a standard must defines things that can be practiced in a repeatable manner by others. For a standard to include items that cannot be practiced by others, purely due to lack of a coherent definition, is antithetical to the purpose of a standard.

The next time that Microsoft asserts that these flaws are acceptable, because the flawed feature is "optional" we should ask them whether they believe also that competition is optional, since that is the net effect. To claim that the OOXML standard is the key to backwards compatibility with legacy documents while at the same time failing to document the very features that enable this legacy support — this is a sham. To justify this by arguing that such features are "optional" is neither good logic, good engineering, nor good standards development.

Labels:

Friday, July 20, 2007

Stranger than Fiction



This is taken from slide 21 of Ecma's "Standards @ Internet Speed" presentation (22 June 2007 version) which you can download here. It appears to be a sales pitch.

I've joked about the Ecma process before, but I never thought I'd see it written out officially like this. Standards are made available "on time"? Minimize the "risk" of changes? I thought the whole purpose of technical review was to find the problems and fix them? As always, the man who pays the piper calls the tune.

(Also, you gotta love the "wink and nod" approach to patents on slide 16, where they can pretend a problem doesn't exist if one member disputes that a patent is essential to implement the standard.)

Labels: ,

Thursday, July 19, 2007

The Cookbook

The Thimbleberry Inn's executive chef, Guillaume Portes, was sensational. He took the modest kitchen of this even more modest inn, and using local produce and game, and with a flair for the dramatic, created a menu that drew local, regional and even national attention, in ever widening spirals of epicurean and gastronomic success.

Now, fresh back from a guest appearance on a cable cooking show, Guillaume received a call from a publisher, asking if he would be interested in writing a cookbook: "Everyone loves your food. You're a genius. If you write a cookbook we could sell millions."

Guillaume at first was skeptical, "Me, write a cookbook? But I know nothing of writing!"

"Don't worry," said the publisher. "I'll set you up with our best editor, Frank Morris. Frank knows writing. He does all of our celebrity books. It will be a wonderful collaboration."

A few months later and the cookbook was almost ready to publish. The review copies went out in expectation of rave flap blurbs. But what came back...well...it wasn't quite what they had expected:
  1. Complaints that the Inn's most popular dishes were not included. "Why did you leave out your most popular dish, the Pecan Stuffed Pheasant?"
  2. Reports that some recipes were missing steps in their instructions, or that the specified ingredient amounts were vague, missing or incorrect. "One spoon of salt? It would help if you were more specific. Tablespoon or teaspoon?"
  3. Some recipes had ingredients listed that did not seem to be used in the recipe. "What do I use the scallions for? They are listed as ingredients, but nothing explains how or when they are used."
  4. Observations that some instruction steps were vague or relied on unusual sources of information. "How are we to interpret a recipe step that says 'Cook it like Aunt Mable used to cook it' !"
  5. Recipes were missing sauces or said simply, "Add your own sauce." But the Thimbleberry Inn was famous for their sauces. How could any one replicate their dishes if the recipe for their signature thimbleberry sauce was omitted?
  6. A huge number of typographical errors, broken references, inconsistencies, etc., that showed that the preparation of the cookbook was hasty and lacked sufficient review.
The publisher was confused. The editor was aghast. Guillaume was furious. How could the reviewers do this to us? Back stabbers! How dare they! These dishes are the finest in the country. Everyone who comes to the Inn loves and praises them. Look at the restaurant reviews! Look at my prize medals! Surely the reviewers must be in league with my competitor, the evil Gooseberry Inn, and are merely trying to prevent my book from selling!

The greatest dose of abuse was reserved for those reviewers who reported the greatest number of problems.

As the imprecations grew more passionate, and the volume and temperature rose, a timid voice arose from the back of the room, saying, "Uh, but are they right?"

The room grew silent as Fred Osgood, an old-timer, once editor-in-chief, but now on the verge of retirement, spoke his mind:

I've been in this business quite a while, as you all know. I've edited cookbooks before, plenty of them. None for a client as big as the Thimbleberry Inn, but the ones I did edit received good reviews and were modest successes in their day.

Frank, I don't think you've ever edited a cookbook in your life, have you? I didn't think so. The thing you need to know is that cookbooks require careful technical review as well as standard editing. Just because a chef is talented, or a restaurant is popular does not guarantee that the recipes and the cookbook are good. A recipe is not a dish. There is a lot of time and hard work required to turn Guillaume's natural genius in the kitchen into something that readers of our cookbook can replicate in their own kitchens.

Back when I edited cookbooks, I made sure that we set up a test kitchen and verified every step of every recipe. We cooked, revised, and cooked again until we could say that every recipe worked as written.

I have no doubt that Guillaume can cook every one of these dishes from memory and get it right every time. I don't think any of us can question that. But that is not the important question. The cookbook is not for Guillaume's benefit. The question we need to ask is whether our readers can cook these dishes using these recipes. In other words, are the recipes relevant, complete and accurate? This is what makes writing a cookbook different than cooking, and this is where we have failed our readers. The reputation of the Thimbleberry Inn as well as this publishing house depends on us doing this right. We need to send this cookbook back for full and proper technical review.


The room was silent for a minute, as the others gazed at their feet. Then a smirk came over Guillaume's face and he struck his fist loudly on the table, stood up and said:

But I need this cookbook now! We must do a better job at finding good reviewers. Let's throw out the reviews we have so far. Let me give you the names of some of my friends, partners and former colleagues. Don't even bother sending them a review copy. They don't need to read it. They've all eaten at the Inn. They know how good my food is. Just have them fax over their reviews. Let's get moving, gentlemen! The 2nd edition of the Gooseberry Inn's cookbook is due out any month now. We can't be left behind!

Old Fred silently collected his things and left via the back door, muttering.

Labels:

Sunday, July 15, 2007

OOXML Fails to Gain Approval in US

On Friday July 13th, INCITS V1 met via teleconference for 3 hours but failed to reach a 2/3 consensus necessary to recommend an "Approval, with comments" position on Microsoft "Office Open XML" (OOXML) document specification.

V1 is a Technical Committee of INCITS, an industry forum accredited by ANSI for recommending the US position on ISO/IEC JTC1 ballots. On April 2nd the INCITS Executive Board asked V1 "to coordinate and develop the U.S. recommended position" on OOXML and to return this recommendation by July 17th. After several meetings, including a two-day face-to-face meeting in Washington, DC in late June, and the recording of over 300 member-submitted comments, V1 voted last Friday.

The initial motion of "Approval, with comments" failed by two votes to receive the 2/3 necessary to pass the motion. Further motions of "Disapproval, with comments" and "Abstention, with comments" also failed. ("Disapproval, with comments" is also sometimes called "Conditional Approval" since it signals that the committee would change its vote to Approval if the concerns raised in the comments were addressed in a revised version of the submission). The result is that V1 will report out a large list of technical comments for consideration by INCITS, but will not report a consensus position on this controversial ISO "Fast Track" submission.

An important factor in the V1 vote was the large number of members who joined very late in the process. At the start of the year, V1 had only 7 voting members. But by Friday's meeting V1 had 26 voting members. There was a clear pattern in the voting where the long-time V1 members voted for the "Disapproval, with comments" position as well as "Abstention, with comments" while the newer members voted overwhelmingly "Yes, with comments" and against "Abstention with comments." This is not surprising since the new members were largely Microsoft business partners.

The following chart makes this trend clear. As you can see, at the start of the year, V1's membership consisted of seven organizations, six of whom on Friday voted "Disapproval, with comments", and one (Microsoft) who voted "Approval, with comments". The membership spurt came at the very end, in the last month, when 16 new members joined V1. Of these 16 new members, 14 of them voted, "Approval, with comments" on Friday.


Note that this is not the final step in developing the US position on OOXML. The next step will be for the INCITS Executive Board to review the comments V1 has generated, and then to determine the US position via a 30-day letter ballot. That, followed by a possible 10-day reconsideration ballot, will take us to the September 2nd deadline for this JTC1 ballot. It is typical practice for INCITS to follow the recommendations of its technical committees. But since the committee of technical experts in V1 was not able to develop a consensus recommendation, it is not clear how the INCITS Executive Board will now make their decision.


Updates

7/16/2007

So it is perfectly clear to all, the above represents my views, observations and opinions. It does not represent an official report of V1's position, nor necessary reflect the views of INCITS staff and officers or other V1 members. Of course, I was on the call, I voted, and I have the meeting minutes in front of me. This isn't exactly rocket science. But I'm adding this disclaimer so there is no room for confusion on this point.

7/17/2007

It has been brought to my attention that the "Approval, with comments" ballot failed by two votes, not by one. This was a mathematical error on my part. I've corrected the post accordingly.

Labels:

Monday, July 09, 2007

The Formula for Failure

It has been a boast for around around 6 months now. Microsoft's OOXML fully defines spreadsheet formulas, and ODF doesn't. The Microsoft boosters have been parroting the party line for quite some time.

Miguel de Icaza gleefully noted back in January:

OOXML devotes 324 pages of the standard to document the formulas and functions.

The original submission to the ECMA TC45 working group did not have any of this information. Jody Goldberg and Michael Meeks that represented Novell at the TC45 requested the information and it eventually made it into the standards. I consider this a win, and I consider those 324 extra pages a win for everyone (almost half the size of the ODF standard).


And Microsoft's Jean Paoli quoted in May in InfoWorld:

As far as those 6,000 pages of specs is concerned, there are 350 pages in the OpenXML spec alone -- half of the entire ODF spec -- just to describe spreadsheet capabilities, which ODF doesn't have, Paoli says. For example, ODF can't describe or calculate a formula in a spreadsheet.

"It may sound amazing. They are working on it now. But the current standard doesn't have it," Paoli tells me.


There are many other examples, if you care to seek them out. But what you will not find is an examination of what OOXML actually specifies for spreadsheet formulas, or confirmation that it was done sufficiently. Maybe the assumption is that this would be a trivial task, documenting Excel's behavior? What could possibly go wrong?

Let's find out.

First, let's take the trigonometric functions, SIN (Part 4, Section 3.17.7.287), COS (Part 4, Section 3.17.7.50) and TAN (Part 4, Section 3.17.7.313). Hard to mess these up right? Well, what if you fail to state whether their arguments are angle expressed as radians or degrees? Whoops. Same problem for the return value of the inverse functions, ASIN (Part 4, Section 3.17.7.12), ACOS (Part 4, Section 3.17.7.4), ATAN (Part 4, Section 3.17.7.14), and ATAN2 (Part 4, Section 3.17.7.15). It is hard to have interoperable versions of these functions if the units are not specified. What kind of review in Ecma would miss something so simple?

The AVEDEV function (Part 4, Section 3.17.7.17) should return the average deviation of a list of values. However, the formula given for this function is actually for calculating the number of combinations of n things taken k at a time. Nice formula, though. Jakob Bernoulli would be proud. But anyone using an OOXML spreadsheet application that follows this standard will be perplexed at the values returned by their AVEDEV function. Did these formulas get any expert review in Ecma?

It is hard to have confidence in the CONFIDENCE function (Part 4,Section 3.17.7.47). It is said to return the confidence interval around a sample mean given an alpha value, a standard deviation and a sample size. The problem is that this problem is under-defined. One must make an assumption, not stated here, as to the shape of the data distribution. Is it normally distributed data? Exponentially distributed? Weibull distribution? The standard does not define the meaning of this function sufficiently for one to implement it.

The CONVERT function (Part 4, Section 3.17.7.48) converts from one unit to another. Some conversions explicitly allowed include liquid measure conversions such as from liters to cups or tablespoons. But whose cup and whose tablespoon? Traditional liquid measures vary from country to country. In the US, a cup is 8oz, except for FDA labeling purposes when a cup is 240ml. But in Australia a cup is 250ml and in the UK it is 285ml. Similarly a tablespoon has various definitions. OOXML is silent on what assumptions an application should make. I guess I won't be using OOXML to store my recipes, and certainly not to calculate medical doses!

Almost every one of the financial functions in OOXML depends on a "day count basis" flag, such as US (NASD) 30/360, Actual/Actual, Actual/360, Actual/365, European 30/360. These represent various conventions for how days and months are counted. The problem is that the OOXML standard does not define these conventions, nor does it point to an authority for their definition. There are subtle behaviors here, especially when dealing with leap years and Excel's deviant treatment of dates in the year 1900. So lack of detailed definitions in this area make it impossible for anyone to rely on identical financial calculations from different OOXML implementations. This, in a field where being off by a penny can cause problems.
Almost 30 spreadsheet functions are broken in this way.

(What do you call a scientist whose calculations are off by 50%? A cosmologist. What do you call an accountant whose calculations are off by 1%? A crook.)

The NETWORKDAYS function (Part 4, Section 3.17.7.344) seems simple enough. It returns the number of workdays (non weekend days) between two dates. Simple enough. Unless you live in the Middle East. The problem is that this function doesn't provide a facility for distinguishing the different weekend conventions. I may have a weekend on Saturday & Sunday, but a colleague in Tel-Aviv might have off Friday and Saturday, while in Cairo it might be Thursday and Friday. This function lacks the adaptability to deal with this important cultural difference. Saying that the definition of the weekend is implementation- or locale-dependent won't work either. I may be a French company in Paris dealing with contractors in Algeria. I need to have a French spreadsheet calculate schedules for workers at various locations and be able to exchange it with others offices using other OOXML applications and expect that they will get the same answer. Lacking cultural adaptability, OOXML fails approximately a billion people here.

Another example. Several of the statistical functions in OOXML are defined incorrectly. Take for example, the ZTEST function (Part 4, Section 3.17.7.352). The key error is following the formula where it says, "where x is the sample mean." The problem is that x-bar is the sample mean, not x. Someone who implements according to the text will give their users the wrong answer. A similar error is repeated in 8 other statistical functions. Certainly this is a typographical error, but this error changes the answer. Remember, this is an approved Ecma Standard and a proposed ISO Standard, not a 4th grade school essay. Denmark and Massachusetts have already said they will adopt OOXML for official business. Spelling counts. Providing the right formula and the right description counts. Copy and paste errors should have been taken care of back during the Ecma review.

I've submitted these spreadsheet formula issues, and many others, to INCITS V1, for consideration in determining the US position on the OOXML ISO ballot, but we never got to them during our two-day meeting in DC a couple of weeks ago, and may not get to them at all. There are simply too many other issues to read through and discuss. But I thought it was important to bring up these formula issues in particular, since Microsoft seems especially proud of their work in this area, delusions of adequacy which on reflection must now seem unwarranted. I'm especially concerned with the financial functions, since they are outside my area of expertise and may have additional errors that I missed.

So what is ODF doing about formulas? We're continuing to work on them. Rather than rush, we're doing careful, methodical work. We're documenting the functions in great detail. Where we have the choice between the common naive formula for a function and one that is numerically stable, we're documenting the stable function. For the NETWORKDAYS function, we created an optional extra parameter, so a user can pass in a flag that tells what their weekend conventions are. We have a professor of statistics reviewing our statistics functions for completeness and accuracy. We're verifying our assumptions about financial functions by referring to core specifications from groups like the ISDA and the NASD. We're creating a huge number of test cases and checking them with Excel and other applications.

Under Sarbanes-Oxley, a CEO or CFO puts himself at personal risk if he signs off on financial numbers derived from processes and tools that he knows to give erroneous results. So we utterly reject a rushed process that has lead to an Ecma Standard which incompletely and incorrectly defines spreadsheet functions. Some things are worth taking the time to do right.

As I've shown, in the rush to write a 6,000 page standard in less than a year, Ecma dropped the ball. OOXML's spreadsheet formula is worse than missing. It has incorrect formulas that, if implemented according to this standard, would raise important health, safety and environmental concerns, aside from the obvious financial risks of a spreadsheet that calculates incorrect results. This standard is seriously messed up. Shame on all those who praised and continue to praise the OOXML formula specification without actually reading it.

Labels: ,

This page is powered by Blogger. Isn't yours?