Sunday, August 17, 2008

Giving the Finger to the DIS 29500 Appellants

The news broke on Friday, with the ISO press release, and additional coverage and analysis by Andy Updegrove and on Groklaw. But it would be remiss if I did not share a few details on how, true to form, the end of this DIS 29500 process was botched.

First, let's put this in perspective. We're talking about members of an organization, in this case four members of ISO/IEC JTC1, raising an appeal under the rules of that organization, alleging that the organization failed to follow its own rules. Almost every organization has a provision for dispute resolution, including the rights of members to appeal the decisions of elected officers or staff. This is a basic part of governance.

It is a worthwhile exercise to see how this "right to appeal" is handled by other SDO's. Let's take a few examples from other organizations that deal with tech standards.

First, let's look at OASIS, a consortium that creates XML standards, like ODF. Any three OASIS members may lodge an appeal if they believe that OASIS procedures have been violated. Resolution is first attempted via correspondence, but if that fails to satisfy the appellants, they then may request a in-person hearing at the next OASIS Board of Directors meeting, where they can present their complaint. This request cannot be denied. It is a right of the members.

INCITS, the US NB in JTC1 has a different approach to appeals, detailed in section 5.8 of their RD-2 [pdf] Procedures guide. Appeals in INCITS are based on the following principles:

  • Appeals shall be addressed promptly and a decision made expeditiously.
  • The right of the involved parties to present their cases shall not be denied.
  • These procedures shall provide for participation by all parties concerned without imposing an undue burden on them.
  • Consideration of appeals shall be fair and unbiased and shall fully address the concerns expressed.
  • Records of appeals shall be kept and made available upon request. The INCITS Secretariat may levy a nominal charge to cover the cost of reproduction, handling and distribution for requests received from other than the involved parties.

Any INCITS member may lodge an appeal, and if an informal attempt at resolution with the INCITS Secretariat fails, an appeals panel is formed to hear the appeal. The impartiality and balance of the appeals panel is explicitly considered:

The appeals panel shall consist of three individuals who have not been directly involved in the matter in dispute. At least two members shall be acceptable to the appellant and at least two shall be acceptable to the INCITS Secretariat.

From large consortia, to NB's, let's poke around further and look at a industry group, AIIM, with a standards program focused on enterprise content management (ECM) technologies. Section 7.0 of their Policies and Procedures [pdf] manual defines their appeals process.

Persons who have directly and materially affected interests and who believe they have been or will be adversely affected by any procedural action or inaction by AIIM as a standards developer with regard to the development of a proposed American National Standard or the revision, reaffirmation, or withdrawal of an existing American National Standard, have the right to appeal.

The appeal is heard by three member panel, selected as in INCITS to be impartial and balanced:

The appeals panel shall consist of three members selected from the AIIM membership in addition to the Chairperson. The Chairperson of the panel shall be the Standards Board Chairperson, and shall not have a vote in the decision of the panel. The voting members of the panel shall not have been directly involved in the matter in dispute, and not be currently involved in the development of the standard(s) in question, and shall not represent or be an employee of an interest that can be made directly or materially affected by any decision made by or to be made in the dispute. The voting members of the appeals panel shall be agreed to by both the appellant and the respondent.

Perhaps readers can post other summaries of SDO appeals procedures, to give a broader sense of what the common features are. From what I can tell, the best practices are:

Members have a right to appeal decisions of the organization, and to have their appeal heard and considered, in person, by a panel chosen to be impartial and balanced. Although the appellants are not guaranteed that their views will prevail, the rules do no allow the organization to repress the appeal and not let it be heard.

So with that as background, it is interesting so observe how ISO/IEC JTC1's antiquated cold war era rules in effect serve to stifle criticism, repress dissent, and prevent even a hearing on the merits of an appeal. As I'll show, even with this strong organizational bias against appeals, the current DIS 29500 were only dismissed with assistance from a poorly written ballot question, NB confusion resulting in contradictory votes, and an unwillingness of committee chairs to attempt to reach consensus. Organizational failures, in the end, are usually leadership failures.

For an appeal in JTC1 to be heard, two different committees, ISO/TMB and IEC/SMB must first agree to allow the appeal to be heard. The reader should note the increased difficulty of getting two different committees to agree on the same decision, and consider the following mathematical diversion.

---------

Suppose you are pushing for an proposal that has, on average, 50% support within a given organization. It is put to a vote in a subcommittee drawn randomly from that population. What is the probability that the proposal will pass a vote in that subcommittee?

50%. I think most of us have an intuitive sense of that.

But what if there are two subcommittees drawn randomly from that organization, and the proposal must win a vote in each one of the subcommittees, what is the chance it will pass?

Is it still 50%? No. I hope most of us have that intuitive sense that the need to pass two committees is harder than passing a single committee. In fact, your chances of approval could be as low as 25%, depending on whether the two committees make independent decisions, or whether there are factors that cause their votes to be partially correlated.

But the general rule is: the more stages of approval required, the less your chances of success

---------

In the particular case of the DIS 29500 appeals, the IEC/SMB requires a 2/3 super majority to approve a ballot. ISO/TMB presumably requires only simple majority. (Like most of JTC1 Directives, this is not explicitly defined).

The astute reader will note that the odds are against the appellants even getting their appeal considered by a panel. In fact, these committee odds match what is required to impeach a U.S. President (50% in House) and remove him from office (2/3 in the Senate).

Given these odds, how did the appellants fair? In ISO/TMB the first irony comes with the title of the ballot:





Remember, a core matter of the appeals is the mistreatment of the contradiction phase of the DIS process, and one of the core matters of the contradiction arguments raised was that the official name of the DIS, "Office Open XML" bore a close and confusing resemblance to the submitters main competitor, Open Office, and that this would lead to confusion in the name of the standard. Well here we are, and in denying this appeal ISO/TMB commits that same error, giving the incorrect name of the standard!

Looking at the actual results, for each of the 4 appeals, ISO/TMB tied on two of them, 6-6, and voted not to pursue two others by 7-5 and 8-3 votes. So, it was very close. In fact I am a bit surprised they simply dropped further consideration of the appeals when they had tied votes like that.

A basic rule that applies to voting is JTC1 Directives, 9.1.3:

The Chairman has no vote and questions on which the vote is equally divided shall be subject to further discussion.

In this case, the Chair (Denmark) did not vote. That is correct. But why did the procedure end with two appeals showing a 6-6 "equally divided" vote? According to the rules this should be leading to further discussion. An equally divided vote is as far from consensus as one can get. Is this how they want to leave it, just hanging like this?

In IEC/SMB, the voting results are even more bizarre. I don't know quite what to make of them. So just a few quick observations.

First, a motion should be carefully worded to it is clear what will happen if the motion passes, and what will happen if the motion fails. A Chair should insist on this, and indeed that is one of their primary duties as Chair, to ensure that questions put to their committee are clear. However, in the case of the DIS 29500 appeals, the ballot questions, as dictated by the Secretaries General, were muddled. I remarked on this in a previous blog post, and other readers observed this as well. Whether done by malice or incompetence, the ballot questions were destined to cause confusion.

The reported results indeed were muddled, as you can see here:




Of the 15 SMB members only two (China and the Netherlands) followed the explicit instructions and voted either the questions in Part A or in Part B (but not both). Both China and the Netherlands voted in one part, and abstained in the other part.

Most members voted both sections, but while expressing a consistent intent, e.g., vote No for not processing Brazil's appeal further, but vote Yes for processing Brazil's appeal further.

However, it appears that three other NB's voted inconsistent, contradictory instructions. In fact one NB (Canada) gave exactly the same votes on section B as in section A, essentially canceling out their vote on every single question. This was from an NB whose written comments stated they they strongly supported hearing the appeals further. Similarly, the votes from Korea are partially contradictory.

We can attempt to reconstruct what a less-confusing ballot would have yielded. For example, take the questions in part A, whether "not to process the appeal any further", where the recorded results were 8-4-3 for the Brazilian appeal, yielding a 2/3 super majority (ignoring the 3 abstentions). But note then that two of the three abstaining NB's (China and the Netherlands) in fact voted in the affirmative for the question on whether to process the Brazil's appeal in part B. It looks like the only reason why they abstained in Part A is that they actually followed the ballot instructions and cast a vote in Section A or Section B. If we apply their clear intent consistently to the Part A question, then the results become 8-6-1 and the motion to "not process the appeal any further" would have failed for lack of 2/3 majority.

I cannot make sense of Korea's votes. Although they seem to have supported two appeals, while not supporting two other appeals, their inconsistent votes make it impossible to tell which ones they supported and which ones they did not.

Needless to say, a ballot that yields results where it is impossible to tell what the voters wanted is a hallmark of a seriously flawed, useless ballot. The SMB results are tainted by a poorly written ballot question, given to them by the Secretaries General, which has clearly caused confusion among the SMB voters, and which had a material effect on the results. My analysis of IEC/SMB shows that, like ISO/TMB's vote, the results are nearly equally divided, and IEC/SMB should hang their head in shame if they persist in denying a hearing to these four appeals because of ambiguous results from a poorly written, botched ballot.

This is why ballot results should be released publicly and subject to scrutiny. I do not believe we can trust ISO/IEC to perform quality control on their own processes. The rot is too deep.

Labels:

Tuesday, July 22, 2008

Sed quis custodiet ipsos custodes?



We are coming down to the last week for JTC1 to decide on whether to hear the four NB appeals concerning various claimed errors in the processing of DIS 29500 (OOXML), or whether summarily to dismiss these appeals without hearing them. The decision lies with two committees, the Technical Management Board (TMB) in ISO and the Standards Management Board (SMB) in the IEC.

Back on July 4th, the Secretaries General of ISO and the IEC referred the four NB appeals, with their comments, to the TMB/SMB. Groklaw has the text of these comments, in PDF format, as well as HTML transcription.

The comments of the Secretary General are accompanied by a ballot, asking the question:

ACTION

The members of the Technical Management Board are invited to indicate, by replying yes, no or
abstention on EITHER a) OR b) for each of the four appeals (see item 14 in annex A):

a) not to process the appeal any further:
Item 1 ABNT
Item 2 BIS
Item 3 FONDONORMA
Item 4 SABS

OR

b) to process one or more of the appeals, which would require setting up of a conciliation panel

Item 5 ABNT
Item 6 BIS
Item 7 FONDONORMA
Item 8 SABS

by no later than 4 August 2008.

This is quite a strange animal to see. Why are we having a ballot at all, and only a 30-day one? This is questionable from several perspectives.

First, why are the Secretaries General the ones calling for a ballot? The Directives do not call for them to do so. In fact the Secretaries General are not even called upon to make a recommendation. They are only asked for comments. The Directives say:


11.3.3 The Secretaries-General shall, following whatever consultations they deem appropriate, refer the appeal together with their comments to the TMB/SMB within one month after receipt of the appeal.

11.3.4 The TMB/SMB shall decide whether an appeal shall be further processed or not. If the decision is in favour of proceeding, the Chairmen of the TMB/SMB shall form a conciliation panel (see 9.2).

But deciding is not the same as voting. One of the cardinal principles of JTC1 is to discuss and seek consensus, not rush to a vote. Indeed, this is one of the matters under appeal, the rush to voting at the OOXML BRM. JTC1 Directives, section 1.2 says (my emphasis):

These Directives are inspired by the principle that the objective in the development of International Standards should be the achievement of consensus between those concerned rather than a decision based on counting votes.

But here we are, with a vote pushed on the TMB/SMB.

The sense of the vote is wrong as well. The Directives call for a decision on "whether an appeal shall be further processed or not." Note the wording. It did not call for a decision on "whether to accept the recommendation of the Secretaries General". But somehow, we skip discussion, skip over consensus and get a ballot question which asks the opposite question first "not to process the appeal any further". In an environment where many parties automatically vote Yes to the ballot question, changing the sense of the question in this way is prejudicial to the appellants.

So it is clear from the start that the powers that be do not want to give these four NB's the opportunity to make their case or be heard. In any case, let's take a deeper look at some of the subjects under appeal and see if we can detect what it is exactly that cannot bear the scrutiny of a duly processed appeal.

First up is the alleged mishandling of the contradiction period last year. The Secretaries General dismiss this complaint, saying that it was a matter of judgment:

The Directives give the JTC 1 Secretariat and ITTF latitude to use judgement as to whether a meeting should be organized to address alleged contradictions. Considering that other issues could potentially be identified during the DIS ballot, the JTC 1 secretariat and ITTF concluded that it was preferable to initiate the ballot and to allow all issues to be addressed by the BRM. The NBs were fully informed of all the claimed contradictions and Ecma's responses to them.

This argument doesn't hold water. Although the JTC1 Secretariat and ITTF are allowed judgment, this is not an absolute license which cannot be questioned. The Secretariat and ITTF also have defined duties, and their actions or inactions with respect to these duties can be questioned and are subject to appeal. Specifically, an NB may appeal the issue of an inaction of JTC1, according to JTC1 Directives, 11.3. So for the Secretaries General to suggest that this inaction cannot be appealed because it is a matter of judgment is nonsense. Judgment and duty are the proper matters for an appeal.

So what is the duty in this case? As stated in JTC1 Directives, 13.4:

If a contradiction is alleged, the JTC 1 Secretariat and ITTF shall make a best effort to resolve the matter in no more than a three month period, consulting with the proposer of the fast-track document, the NB(s) raising the claim of contradiction and others, as they deem necessary. A meeting of these parties, open to all NBs, may be convened by the JTC 1 Secretariat, if required.

If the resolution requires a change to the document submitted for fast-track processing, the initial document submitted will be considered withdrawn. The proposer may submit a revised document, to be processed as a new proposal.

If the resolution results in no change to the document or if a resolution cannot be reached, the five month fast-track ballot commences immediately after such a determination is made.


The Directives call for the JTC1 Secretariat to make a best effort to resolve the matter (JTC1 Directives, 13.4). The JTC1 Secretariat is not given latitude to do nothing, or allowed discretion to immediately defer this question to the ballot period, without making a best effort to resolve the matter.

When a new 6,000 page DIS is submitted to JTC1 only one month after the publication of another standard (ODF) in the exact same space (XML document formats for office applications) and 19 NB's submit contradiction statements, and the JTC1 Secretariat's "best effort" is to hold no consultations with the NB's claiming contradictions, to hold no meeting, to make no attempt to resolve the question, then I believe that any NB would has a legitimate grounds for appeal on the inaction of JTC1 with regards to contradictions. There is no evidence that a "best effort" was made here to resolve the contradictions. Doing nothing is clearly incompatible with the required “best effort”.

It should be noted that JTC1 has had challenges in the past getting ITTF to carry out their responsibilities with respect to contradictions, which lead to this resolution adopted unamimously at the 2000 JTC1 Plenary:

Resolution 27 - Consistency of JTC 1 Products

JTC 1 stresses the strong need for consistency of its products (ISs and TRs) irrespective of the route through which they were developed. Any inconsistency will confuse users of JTC 1 standards and, hence, jeopardize JTC 1's reputation. Therefore, referring to clauses 13.2 (Fast Track) and 18.4.3.2 (PAS) of its Directives, JTC 1 reminds ITTF of its obligation to ascertain that a proposed DIS contains no evident contradiction with other ISO/IEC standards. JTC 1 offers any help to ITTF in such undertaking. However, should an inconsistency be detected at any point in the ratification process, JTC 1 together with ITTF will take immediate action to cure the problem.

Perhaps it is time to give ITTF another reminder of their obligations in this regard?

Further, the determination claimed to have been made by the JTC1 Secretariat and ITTF was not communicated to JTC1 NB's. Instead, the JTC1 Secretariat merely forwarded Ecma's responses to the contradiction submissions along with a notification that the DIS ballot should then commence. No statement was made as to whether the ballot was commencing because the contradictions had in fact been resolved, or because a resolution could not be made, which are the only two outcomes allowed by the Directives in 13.4. Not to notify NB's of the actual state of the resolution of the contradictions submissions is incompatible with the JTC1 Secretariat's duty to make a best effort to resolve the matter.

This failure by JTC1 materially effected the ensuing ballot, since Microsoft was then able to take advantage of this procedural nonperformance and repeatedly represent to NB's that the contradictions had been rejected as invalid and could not be considered in the DIS ballot. In fact, this led to several NB's issuing explicit, but erroneous instructions to their members that the contradictions had been resolved and thus could not be raised again as a criterion for determining their national position, e.g., in Australia.

Further, although the Secretaries General claim that “the JTC 1 secretariat and ITTF concluded that it was preferable to initiate the ballot and to allow all issues to be addressed by the BRM” the documented fact is that the BRM Convenor explicitly disallowed any discussion of contradictions at the BRM.

Another subject of appeal was the irregular voting procedures used at the DIS 29500 BRM in February. This is the P-member versus O-member question. The Secretaries General dismiss this appeal in this way:

2e. Correct but inapplicable. The BRM was neither a meeting of JTC 1 nor of SC 34 but was open to all 87 national bodies which submitted a vote (including abstentions) on the DIS. Applying 9.1.4 would have disenfranchised the voting NBs present at the BRM which were not P-members. The fact that any votes in the BRM would be open to all national delegations present was communicated over three months prior to the BRM.

This argument presented is flawed, and amounts to saying, “The voting was done by P- and O-members because the meeting was attended by delegations from P- and O-members”. Who attended the meeting is immaterial. Liaisons such as Ecma also attended the BRM? Should they have been able to vote merely because they attended? No, of course not. Voting rights are defined in JTC1 Directives, and this must not be set aside in favor of an ad-hoc rule made without NB consultation or approval.

Asserting that applying 9.1.4 would disenfranchise NB's is an example of circular reasoning. One can only be disenfranchised if one first has the right to vote. So the statement by the Secretaries General is arguing a conclusion (O-members are permitted to vote at BRM's) by assuming the very thing it tries to prove.

JTC1 Directives 14.4.3.9, which defines the parallel BRM process for the Publicly Available Specification (PAS) transposition process, reads: “At the ballot resolution group meeting, decisions should be reached preferably by consensus. If a vote is unavoidable, the approval criteria in the subclause 9.1.4 is applied.” So here we see 9.1.4 explicitly called for. By the argument put forth by the Secretaries General, all PAS BRM's which follow the Directives are also flawed because they “disenfranchise” those NB's who are not P-members of JTC1. I believe this is a tortured reading of the Directives. The voting rules of 9.1.4 are explicitly and unambiguously called for in PAS BRM's, so one cannot dismiss their application to Fast Track on general principles that would apply equally to PAS. When Fast Track rules say that the BRM vote shall ("if a vote is unavoidable") "be taken according to normal JTC 1 procedures" then we are faced with two alternatives:
Which one is "normal JTC 1 procedures"? Where is the basis in the Directives for believing that O-members had the right to vote at the BRM?

Finally, neither BRM Convenor, Alex Brown, nor ITTF, nor indeed the assembled delegations at the BRM were competent nor had the mandate to make or change voting rules for a DIS BRM. The rules are set in JTC1 Directives, and must be followed. “These Directives shall be complied with in all respects and no deviations can be made without the consent of the Secretaries-General.” (JTC1 Directives 1.2).

Notifications made by the BRM Convenor in advance of the BRM have no weight on matters which exceed his mandate and authority. The communication referred to by the Secretaries General, which was given in advance by the BRM Convenor, was from this FAQ:

6.8 If votes are taken during the BRM, who votes?

Those present.

This in fact was not the rule applied at the BRM. For example, Liaison representatives could not vote, though they were undoubtedly present at the BRM and participated fully in other ways. Also individual participants could not vote, only delegations, via their HoD could vote. So the Convenor's glib communication should not be taken as notification of a novel voting procedure.

Additionally, the BRM Convenor was unambiguous in his communications on his blog where he clearly stated that the voting rules of 9.1.4 would be applied:

...Now, paper balloting follows normal JTC 1 in-meeting rules: In a meeting, except as otherwise specified in these directives, questions are decided by a majority of the votes cast at the meeting by P-members expressing either approval or disapproval. (9.1.4)

(After the BRM the Convenor dutifully went back and “corrected” his earlier blog post to reflect how the BRM actually operated.)

The Secretaries General further dismiss the concerns regarding BRM voting procedure, saying:

4e. Not correct. Decisions on the comments not discussed during the BRM and proposed dispositions were taken by a process agreed by the BRM itself (29 votes in favour, none against and 2 abstentions).

On the contrary, the BRM was not competent and had not the mandate to set its own voting rules or to negate the provisions for consensus stated in JTC1 Directives 1.2:

These Directives are inspired by the principle that the objective in the development of International Standards should be the achievement of consensus between those concerned rather than a decision based on counting votes.

[Note: Consensus is defined as general agreement, characterised by the absence of sustained opposition to substantial issues by any important part of the concerned interests and by a process that involves seeking to take into account the views of all parties concerned and to reconcile any conflicting arguments. Consensus need not imply unanimity. (ISO/IEC Guide 2:1996)]

The Directives specify the rules. If NB's do not like the rules, then NB's may work with SWG-Directives to define new rules and then vote on them using the defined process. But if the rules are not applied correctly, then the proper course is for NB's to appeal against the actions or inactions of those with a duty to carry out the rules. This is the essential governance model of JTC1. NB's rule, but they rule through the rules. We may not merely decide by majority vote to ignore rules for this DIS, or to institute new rules for that DIS, or to substitute different rules for another DIS, in an ad-hoc fashion, based on a BRM vote.

Using the logic given by the Secretaries General, what in principle would prevent a BRM from voting itself an Augur in addition to a Convenor for the purpose of observing the flights of birds to decide whether a given change to the DIS text was auspicious or not? Is there any voting procedure that would not be permitted them once we say that a BRM, by majority vote, can institute their own voting rules? Are TMB/SMB certain that this is the principle that they want affirmed by their rejection of the NB's appeals?

Further, NB's were not duly notified that their BRM delegations would be determining their own voting rules, so few if any of them had NB instructions on that matter. An agreement among BRM HoD's to set aside cardinal principles of JTC1, in the absence of NB consultations, should not be allowed to stand.

Finally, the existence of a vote at the BRM is not incompatible with the assertion that the BRM was “too short, arbitrarily short, or otherwise incorrectly conducted”. When given the choice between several bad alternatives, the delegations made a choice. That does not legitimatize the flawed application of JTC1 process that incorrectly gave them only bad choices and forced upon them a vote which they did not have the mandate to hold.

I could go on and on, but I'll spare you all more of the same. I am sorry to report that I find the response by the Secretaries General to be perfunctory, poorly reasoned and self-serving. It does not serve to resolve the issues, including important issues where clarification is needed. Majority rule, within the rules, should be encouraged. But to dismiss legitimate complaints by arguing that the majority agreed to not follow the rules, this is to substitute mob rule (or orchestrated monopoly rule) for the rule of law. We know where that leads to -- curtailed rights for those with minority opinions. And that should concern everyone.

The Secretary General of ISO, Alan Bryden, retires at the end of the year. August vacation is approaching, and before you know it there will be a retirement party with the cake and gifts, maybe a wall plaque or pewter paperweight. I am sure he does not need or desire to spend more time being reminded of the OOXML disaster that occurred during his last year at ISO. TMB/SMB members all want vacation as well. So do I. But out of respect for Mr. Bryden's eventual successor, and our shared mission in JTC1, shouldn't we urge TMB/SMB to do their job and not leave this all unresolved for the next guy to deal with? Dismissing an appeal with so many open unresolved issue is not expediency. It is merely creating more dissent, more distrust and more trouble that we'll all need to deal with next time around. It is better, I think, to hear the appeals, get to the bottom of this, seek resolution, consensus and closure, and then to move on. Ignoring mistakes will not make them go away.

Labels:

Thursday, July 17, 2008

What is Rick smoking?

Former Microsoft consultant Rick Jelliffe has posted his own particular brand of science fiction/fantasy, this time in his favorite subgenre, a parody of a drug-induced psychosis, where after uneasy slumber Rick awakes in some alternate parallel universe and finds that JTC1/SC34 is open and transparent and OASIS is closed, and decides to write a rambling blog post about it.

If you like unintentional humor, you will enjoy reading Rick's over-the-top post.

Rick suggests that organizationally JTC1/SC34 is a more participatory environment for developing standards than OASIS.

JTC1's process, based on National Body voting is both effective ... and more genuinely open, because it is impossible to stack either directly or indirecty.

Let's test that proposition. Let's compare OASIS and JTC1/SC34.

Who can participate? In OASIS, anyone can participate, from any company, organization, government agency, non-profit corporation in the world. Or you can join as an unaffiliated individual, as many have. You don't need your government's permission to join. You just do it. Most join with a nominal membership fee ($300 for individuals) but membership grants are available in some cases, when the fee would be burden for active individual contributors.

What about participation in JTC1/SC34? First, you must be a member of your NB. How do you become a member of your NB? In the US the price is $1,200 and you must be representing a company or organization. Individuals? Sorry, you are not allowed to participate. In other countries the rules vary. In some cases membership is not available at all at any price. You are essentially wait-listed until an opening becomes available. (Sorry, we don't have enough seats, we heard in Portugal). In some countries, like China, membership is forbidden to native citizens who are employees of foreign subsidiaries in China. In other countries you can't join at all. It is entirely a government decision. So, good luck joining the NB of Syria, where the constitution has been suspended under emergency rule since 1963. (But somehow they managed to make time to vote on the OOXML ballot. Zimbabwe as well, that paragon of open participation.)

Now, it is entirely possible for a standards organization to appear open, but in practice to be inaccessible. So we must look at the complete cost of participation, not just the initial membership fees.

The OASIS ODF TC does its work entirely on an email list, a wiki, and via weekly phone calls, which are toll-free calls for most participants. I don't recall there ever being a face-to-face meeting, certainly not so long as I've been a member. This use of technology lowers the barrier to participation, so anyone can be effective on the TC if they wish. In particular it makes it easier for those who have day jobs and can only contribute to the mailing list during non-work hours.

What about JTC1/SC34? To participate effectively requires attendance at several international meetings each year (Plenary's, WG's, Ad-hocs, BRM's, etc.), as well as participation at NB meetings. Since many of the participants are representative of large corporations or government agencies, a junket mentality prevails and the meetings are often held in some of the most expensive places in the world: Geneva, Granada, London, Kyoto, Jeju Island, etc.

JTC1 does not allow meeting participation by telephone. Since important votes, are held at these meetings, and no provision is made for remote participation, one cannot effectively participate in JTC1/SC34 without a substantial budget for international travel. Attendance at a single meeting — the DIS 29500 BRM — was $3687.52 for me, and I flew coach and ate cheap. How many standards meetings like that can you as an individual or your small company afford per year?

Further, note the nature of your membership — what can you actually do? Can you vote? In OASIS, it is one person/one vote. In the TC, your vote as an individual with a $300 membership fee is counted exactly the same as my vote representing an OASIS Foundational Sponsor. At the organizational level, it is one company/one vote, and the smallest OASIS member organization has exactly the same vote as the largest.

In JTC1/SC34 however, you typically can't vote at all. NB's vote, not individuals, not companies. So your opinion and your wishes are subject to the will of your NB. If your opinion varies from your NB's, you may not be accredited to attend an international meeting, and even if you are able to attend you may not be allowed to speak your opinions. This extra level of indirection and censorship means that you, as an individual, can do little. And to the extent your NB's committee is stacked by a single vendor and their partner community, or your NB decides to overrule or ignore its technical committee, or Microsoft calls your head of state to change the NB's vote, or any of the dozens of other documented shenanigans that recently occurred, your entire membership fee and participation will be an entire waste of time, money and effort.

Membership is OASIS is far more open and inclusive. You join. You discuss. You vote. Period. In JTC1/SC34, you are mired in layers of bureaucracy at the national and international level, in a system crafted by and for the big boys to cut back room deals and manipulate the process to the benefit of large corporations.

(Now that isn't to say that there are not some individual consultants out there who thrive in the JTC1 environment by mastering its dark, dusty, demon-haunted hallways. Even the largest corporations occasionally have need of this expertise, as Rick and others are quite aware. If JTC1/SC34 were truly open and transparent, such skills would not be needed. You certainly don't see anyone selling their services to help companies navigate OASIS, do you?)

What about transparency? As Rick demonstrates, OASIS meeting minutes and agenda are all posted and public. So is our mailing list. So are all of our drafts. So is our member and public comments.

But in JTC1/SC34, most of the documents are private, only accessible to SC34 members by password. And then occasionally JTC1 will step in prevent SC34 from releasing their own work , suppressing documents even from their own SC members. There are no public comments to speak of, and member comments on draft standards are secret.

So when you are back from your "trip", Rick, please let us know again, who wins on openness, participation and transparency?




And for the record, a couple of outright deceptions in Rick's post:

Labels: , , ,

Wednesday, July 16, 2008

Toy Soldiers

It is interesting to watch the activities of JTC1/SC34 as they go through the motions of processing activities related to OOXML, long after any serious justification for their continuation has ceased. That is the nature of bureaucracy — wind up their clockwork and watch the little soldiers go through their prescribed motions. Come back in an hour and they may be stuck in a corner or knocked over onto the floor. But they'll keep on moving their feet, back and forth, in small steps toward ends unknown and unknowable, the little senseless mechanical men.

One example is the proposals in SC34 to create a new project to create a Technical Report on translating between ODF 1.0 and OOXML 1.0. This might have made sense at some point in the past. But this proposal seems out of place now.

Consider:
  1. No one supports ODF 1.0 today. All of the major vendors have moved on to ODF 1.1, and will be moving on to ODF 1.2 soon.
  2. No one supports OOXML 1.0 today, not even Microsoft.
  3. No one supports interoperability via translation, not Sun in their Plugin, not Novell in their OOXML support, and not Microsoft in their announced ODF support in Office 2007 SP2.
So the proposal essentially will be to create an technical report for a translation task that no vendor is implementing, between versions of the ODF and OOXML standards that no vendor is supporting.

Excuse me if my enthusiasm is muted.

And yes, the proposers want accelerated processing for this proposal. But the idea was already obsolete the day it was proposed to SC34. Events have overtaken it, though the clockwork motions continue, and SC34 is currently having a ballot for this proposal, ending on 29 July. I'm not in favor of it. Perhaps it would be worth considering if resubmitted in one year's time, and was targetted to consider ODF 1.2 and OOXML 1.1 (or whatever their next version is). But is it really a priority for SC34 now?

Another example of working on autopilot is the ad-hoc working group in SC34 looking at OOXML maintenance. Although it was heralded with much pomp "SC takes control of OOXML", the fact is SC34 currently can't even look at OOXML, let alone maintain it. They are entirely impotent. But still they will go through the motions and meet next week in London to advise Alex Brown, who will then take all this advice and later formulate and write up his OOXML maintenance plan for SC34 to vote on.

All the best to them. They voted on OOXML without seeing it. Now they'll determine how to maintain it without seeing it. Maybe ISO should stand for Invisible Standards Organization? Maybe one of the participants can let me know where can I submit my invisible defect report?

In any case, since Microsoft has effective voting control of SC34, after almost two years of packing the committee, my bet is that OOXML will effectively be handed over to Ecma for maintenance. That is what JTC1 has done for every other Ecma Fast Track that has been approved. They might call it a "maintenance group" and allow token participation from SC34 liaisons in a non-voting capacity, but in all important ways it will remain Microsoft/Ecma standard. In the end, this makes some sense. Who is better positioned to clarify exactly how Excel financial functions work, the Microosft engineer who has access to the Excel source code, or an SC34 representative from Khazakstan?

Given the leisure to do the job right, my bet is on Microsoft. Everyone knows it for what it is now. There is no longer need for elaborate attempts to disguise the fact that OOXML is and will remain a Microsoft-only standard. Why continue the charade? If Microsoft put OOXML on MSDN, at least we would all have access to it and would know where to send our defect reports to, which is more than we can say about ISO OOXML. A real open standard is preferred, of course. But given a choice of fake ISO standard and a real MSDN specification, I'll take the MSDN version any day.

Labels:

Thursday, June 26, 2008

Spirits

Google Books has provided me endless hours of enjoyment. Sure, the collection seems to consist mainly of old, out of copyright books, but that makes it even more enticing. It is like a fine used-book store, without the musty smell.

Imagine. It is late at night, and the library is closed and Amazon cannot deliver fast enough. But you absolutely must have immediate access to Kramer and Sprenger's witch-hunting manual, the Malleus Maleficarum, specifically the 1669 Lyon edition. What do you do?

A quick search of Google Books, and there you have it, in full text, with the ability to download it in PDF if you want.




What you do with the "Hammer of Witches" is your own business. But who wouldn't have a little extra bounce in their step knowing they have a copy tucked away on their USB memory stick, "just in case"?

And so, I was browsing through "many a quaint and curious volume of forgotten lore" when I came across a book with the imposing title The Moral Aspects of Medical Life, Consisting of the 'Akesios' of K.F.H. Marx, translated from the German with Biographical Notes and Illustrative Remarks by James Mackness, M.D., published in London in 1846. Note that this is by Karl Friedrich Heinrich Marx (1796-1877), the German physician, not to be confused with the founder of modern communism Karl Heinrich Marx (1818-1893).

Within this text is printed a most remarkable diagram, "A Moral and Physical Thermometer". A footnote ascribes it to another work,'Public Characters', London, 1801, pg 499.

In any case here it is, a marvelous example of data presentation and visualization from 200 years ago. It shows, in parallel scales, the strength of various drinks as well as their effects.



Now, mind you that this diagram is coming from an advocate of water drinking, but it is an interesting view of the state of drink in 1801 London.

A few things stand out:

Cider (to American readers) is non-alcoholic. Here it apparently refers to hard cider. Perry is like hard cider, only made from pears. Don't see much of that around here.

Why is wine listed as weaker than porter and great beer? Today porter is 5% ABV or so, while wine is 12% or so. And even strong beers today are 6-7% ABV. So were things so much different back in 1801? Or is the chart correlated with the social stigma attached to the drink, and not necessarily the potency of the drink?

Some of the drinks are obscure today, in particular toddy (any hot, spiced drink, like mulled cider), crank (unknown — not listed in the OED), flip (eggnog-like drinks) and shrub (lemon juice and rum)

Hysteric Water — we may forget that the root word here is from the Greek "hystera", meaning uterus. It was thought that hysteria was predominately a disorder of women, and was treated by various means, including "Hysteric Water", which according to one period recipe was brandy infused with medicinal herbs: valerian, pennyroyal, rue, mugwort, savin, orange peel, and lovage. It was taken in small doses: 1 ounce.

So why is Hysteric Water listed as the worst of the lot, the one where drinking it leads one to risk whipping or exile to to the Australian penal colony of Botany Bay? Perhaps there was some additional social stigma attached.

One period commenter described the state of affairs like this:

Many a good woman, who would start at the very mention of strong waters, cannot conceive there can be any harm in a cordial. And as the fair sex are more particularly subject to a depression of spirits, it is no wonder that they should convert their apothecaries' shops into rich cordial warehouses, and take drams by way of physic; as the common people make gin serve for meat, drink, and clothes. The ladies perhaps may not be aware, that every time they have recourse to their Hartshorn or Lavender Drops, to drive away the vapours, they in effect take a dram ; and they may be assured, that their Colic, Surfeit, and plague Waters, are to be ranked among spirituous liquors, as well as the common stuff at the gin-shop. The college of Physicians, in their last review of the London Dispensatory, for this very reason expelled the Strong Water, generally known by the soothing name of Hysteric Water; because it was a lure to the female sex to dram it by authority, and to get tipsy secundum artem.

If any of my fair readers have at all given into this pernicious practice of dram-drinking, I must intreat them to leave it off betimes, before it has taken such hold of them, as they can never shake off. For the desire of drams steals upon them, and grows to be habitual, by imperceptible degrees : as those who are accustomed to take opiates, are obliged to encrease the dose gradually, and at last cannot sleep without it.

But it is comforting to know that then, as today, a moderate beer is still good for "reputation, long life and happiness".

Labels: , ,

Beautiful Word Clouds

We've all seen tag clouds by now, the visualization technique that shows the importance (however defined, but typically by prevalence) of a word by assigning a proportionately sized font.

But now comes along a tool that treats these clouds as art. Wordle's "Beautiful Word Clouds" is quite addictive, allowing you to enter the raw text and then play around with layout algorithms, fonts and coloring schemes to produce some very nice looking clouds. The author — Jonathan Feinberg — works here at IBM, a fact I did not discover until I had already wasted hours playing with the tool. So maybe I can count this as work now?

Here are a few examples of word clouds formed by analyzing three different texts. Can you guess the identity of the three texts?











Some of my wish-list items are:

Labels:

Monday, May 19, 2008

Fractured YEARFRAC and Discounted DISC

There are a few things that a spreadsheet can get wrong. You can be hard to use. You can be ugly. You can be slow. You can even crash occasionally. These are all annoyances. With a monopoly, annoyances are a way of life. However, there is one place where a spreadsheet can never be wrong, and that is in the area of calculations, especially financial calculations. A business uses spreadsheets to model and track its business. It makes decisions based on these spreadsheets. If the spreadsheet gives the wrong answer, then the business makes the wrong decision. And a business that makes the wrong decisions does not stay in business for long.

So it was with much lament last year that I reported that OOXML, then under ballot in ISO/IEC JTC, had many egregious errors in its spreadsheet formula definitions. In addition to enumerating these errors in my blog, I submitted them for consideration by INCITS V1, the US SC34 mirror committee, and these became part of the bundle of comments with which the US accompanied its ballot.

Although the ballot wasn't due until September 2nd, the extravagant "sit and do nothing" provisions in INCITS lead to our technical review being cut off in July. Because of the lack of review time, I was not able to do an in-depth review of all the spreadsheet functions, but only a cursory review. But the existence of such errors as I did identify, in the already-approved Ecma Standard, was disquieting. It should have lead Ecma TC45 to conduct a more through review of the spreadsheet functions. But this did not appear to happen.

When we received the response from Ecma to the NB comments, in January, INCITS V1 was asked to go through all the responses (over 1,000 of them) to determine whether they were acceptable. This review period was again insufficient.

As for the BRM in February, this was a travesty, as I and others have noted.

I have a theory concerning committees. A committee may have different states, like water has gas, liquid or solid phases, depending on temperate and pressure. The same committee, depending on external circumstances of time and pressure will enter well-defined states that determine its effectiveness. If a committee works in a deliberate mode, where issues are freely discussed, objections heard, and consensus is sought, then the committee will make slow progress, but the decisions of the committee will collectively be smarter than its smartest member. However, if a committee refuses to deliberate and instead merely votes on things without discussion, then it will be as dumb as its dumbest members. Voting dulls the edge of expertise. But discussion among experts socializes that expertise. This should be obvious. If you put a bunch of smart people in a room and don't let them think or talk, then don't expect smart things to happen as if the mere exhalation of their breath brings forth improvements to the standard.

So the the BRM ended a different committee than it started, and the mode of operation it was lead into caused it to act like a very stupid committee indeed. I don't say this to be accusatory. I'm just making an observation about crowd behavior. When a committee of experts ceases to be a deliberate committee, then you will achieve subpar results.

One of the ways the BRM was stupid is that it approved changes to OOXML that have totally broken SpreadsheetML's financial calculations, rendering the resulting calculations both mathematically incorrect as well as inconsistent with what Excel actually calculates. More about this later.

Here be dragons.

One of the persistent problems with OOXML has been in the area of "day count conventions" as used in SpreadsheetML.

Why should day counting be complicated? Just count how many days between the two dates and you are done, right? Indeed, everywhere but in finance it is simple. Some of the complications are for historical reasons, to try to make hand calculations easier in the pre-computer era. Also, these conventions made the calendar more regular, so financial instruments were less distorted by calendar irregularities like leap years or variable length months. If you assume a year is exactly 360 days in 12 months with 30 days each, then some things in life are simpler. Of course, this makes other things in life more complicated, including defining spreadsheet functions.

The devil is in the details. You must get these count conventions right. Being wrong by just one day in a year may only be 0.3%, but in a million dollar transaction, that is $3000. Not many people can afford to routinely ignore a $3000 error.

Different financial organizations have developed their own different day count conventions. We have the Banker's Rule, the LIBOR rule, the NASD rule, the ISDA rule, ICMA rule, SIFMA rule, etc.

In Excel, these conventions are defined by a function parameter called the "basis" and this basis is used by many financial spreadsheet functions. In fact, none of the following functions are adequately defined unless the day count basis values are adequately defined:

There are five basis conventions defined in OOXML, with values 0-4:
In this convention X/Y indicates that X is the assumption for month length, and Y is the assumption for year length. Note that there is no external authority given as a reference for any of these conventions. So it is not clear, whether the Actual/Actual convention is following, for example ICMA rules or ISDA rules. In fact, it may be following neither rule. So lacking any external authority, all we have to go on are the definitions provided in the specification itself. OOXML is its own authority for day count conventions.

As I reported last July, the definitions provided by OOXML did not sufficiently define the behavior of the conventions that lie behind most of the financial functions in Excel. Unfortunately, Microsoft/Ecma have failed to fix this problem in their proposed Resolutions, and in fact have made it worse. Further, the DIS 29500 BRM, in its negligent bulk approval of Ecma's responses merely advanced these serious errors into the text which was then approved as an International Standard. So essentially, my work in pointing out errors in the spreadsheet language was for naught. Microsoft just shoved it all though anyways.

This puts me in a delicate situation. On the one hand, the ODF TC really would like to finish its work on ODF 1.2, and part of that is completing the OpenFormula work. A key remaining part of OpenFormula is to ensure that our financial functions synch up with how Excel works. What OOXML says is irrelevant, except to the extent that it accurately tells how Excel defines these functions.

However, inquiries to Ecma on these day count conventions, inqueries made months ago, have received no response. Also, the final DIS text of OOXML has not been made available, not even to SC34 members and liaisons. And there is not mechanism in place yet in SC34 for collecting defect reports on OOXML. So we're stuck.

Or maybe not.

David Wheeler, Chair of the OpenFormula subcomittee of the OASIS ODF TC, has been trying to nail down the behavior of Excel's spreadsheet functions for over a year now. One of the last remaining pieces is to nail down the day count conventions. After waiting and waiting for this to be clarified in OOXML, David took matters into his own hands and decided to solve the problem by brute force, enumerating millions of test cases, indeed a comprehensive set of date pairs over a 6 year period, to try to determine exactly how the date bases in Excel work. You can read David's conclusions on his blog.

What strikes me in David's report is that not only are the OOXML definitions incomplete and inconsistent, but they do not accurately reflect what Excel actually calculates. So either Excel is wrong, or the OOXML standard is wrong when calculating almost every financial spreadsheet function. This is quite an embarrassment for an ISO standard, and an unnecessary one, since I have been talking about how poorly defined these functions are for almost a year now.

Let's work through some of David Wheeler's test cases by hand, to get a better feel for how OOXML is broken.

Let's take the YEARFRAC() function as the simplest example. YEARFRAC() takes two dates and a basis value as inputs, and returns the fraction of a year that is between those two dates. So logically, the calculation is like:

YEARFRAC = (interval between start-date and end-date) / length of year.

That is the logical definition. The only complication is the 5 date bases, and what exactly the year length is. This last point is something that stuck out in my mind when I first reviewed the draft of OOXML last summer. You might think the year length is 365 days. But what about leap years? And what about date ranges that straddle normal years and leap years? This is the key fact that this function requires in is definition. This is the problem I reported on my blog last July. This was the problem INCITS V1 submitted with our ballot comments last September. This is the problem that Ecma responded to in a severely flawed way in January. This is the problem that the BRM refused to discuss and merely agreed with Ecma's flawed changes. And this is the problem that is now in the final DIS version of OOXML.

Take a look for yourself in this brief extract from the final DIS version [pdf] of OOXML, provided for purposes of critical review and commentary.

(Yes, I now have a complete copy of the final DIS version of OOXML. If you think that this is unfair -- and I would agree with you on that -- then maybe you should ask ITTF why I was able to get a copy of the final DIS, but no one else in SC34 was.)

Let's take a look at the ISO OOXML definitions and try some test calculations to reproduce some of David's findings.

First, let's take the date basis 1 first, Actual/Actual, since that is the easiest. But we immediately run into a problem. The standard says two different things. In the "description" table it says:

Actual/actual. The actual number of days between the two dates are counted. If the date range includes the date 29 February, the year is 366 days; otherwise it is 365 days.

However, later, when defining the return value, the standard says this:

If the Actual/actual basis is used, the year length used is the average length of the years that the range crosses, regardless of where start-date and end-date fall in their respective years.

There is absolutely no way in which these can both be correct. This would have been easily fixable at the BRM, if the BRM had been allowed to do its job. But I wasn't even allowed to open my mouth and point out this problem. So now this fatal ambiguity sits in the text of OOXML, as authorized by the BRM experts and approved by JTC1 NB's. Gotta love it.

But let's forge on and assume that we really have two algorithms, the first (from the description) which we will call basis 1, and the second basis (from the return value section) which we will call 1' ("one-prime"). We'll calculate it both ways.

Let calculate YEARFRAC(DATE(2000,1,1),DATE(2001,1,1),1)

With basis 1, this is simple. 2000 was a leap year (since it was a century divisible by 400) so the interval between the two dates is 366 days. Similarly, since the date range includes a February 29th, the length of the year used in the calculation is 366. So the returned value returned by YEARFRAC should be 366/366 or 1.0.

With basis 1', this is also simple. The date interval is still 366. But the length of the year is now the average of the year lengths crossed by the date range. So the average of 366 (for 2000) and 365 (for 2001), or an average of 365.5. So using basis 1', YEARFRAC should return 366/365.5 or 1.00137.

Let's try another, with basis 1. What is YEARFRAC(DATE(2000,1,1),DATE(2002,1,1),1) ?

In basis 1, the interval is 731 days (366 days in 2000 plus 365 days in 2001). The year length is 366, since the interval includes a February 29th. So YEARFRAC should return 731/366 = 1.997.

With basis 1', the interval is also 731 days, but the year length is the average of 366 (for 2000), 365 (for 2001) and 365 (for 20002). So YEARFRAC should return 2.0009

Just to make sure we have this down, let's try another example:

What is YEARFRAC(DATE(2000,1,1),DATE(2000,1,2),1) ?

A one day interval? Yes, please humor me.

OK. With basis 1, the interval is one day. Since it does not cross February 29th, the year length is 365 days. So it will return 1/365 or 0.0028.

With basis 1' the interval is also one day, and the year length is 366, So it will return 1/366 or 0.0027

OK. That was too easy. One more example, to make sure that you have it down.

What is YEARFRAC(DATE(2000,1,1),DATE(2004,1,31),1) ?

Hmm.... this one will require more thought. I might have to take off my shoes and count using my toes as well.

With basis 1, the interval is 1491 days = 366 + 365 + 365 +365 +30.

The year lengths are:

2000=366 (since the range cross February 29th)
2001=365
2002=365
2003=365
2004=365 (since the date range does not cross February 29th)

So we have 4 full years plus 30 days of a 365 day year. YEARFRAC should return 4 + 30/365 = 4.0822.

With basis 1' we treat the 2004 as having 366 days and average the years in the interval, so average year length = (366+365+365+365+366)/5 = 365.4. So YEARFRAC should return 4 + 30/365.4 = 4.0821.

Now that we're done with the examples, we can throw them into a table, and compare them to what Excel 2007 calculates for these same parameters:

start-dateend-datebasisISO ValueExcel's ValueExcel Correct?
2000-01-01
2001-01-011

1.0000

1.0000
Yes
2000-01-012001-01-011'1.00137
1.0000
No
2000-01-012002-01-011
1.9972
2.0009No
2000-01-012002-01-011'
2.0009
2.0009
Yes
2000-01-01
2000-01-02
1
0.0028
0.0027
No
2000-01-012000-01-021'
0.0027
0.0027
Yes
2000-01-01
2004-01-31
1
4.0822
4.0805
No
2000-01-012004-01-311'
4.0821
4.0805
No

Let your mind linger on these results for a bit. Let it sink in. Look at this table until you recognize its significance and cringe in disgust. In some cases, Excel seems to be using the first definition of date basis 1. In other cases it is using the second definition of date basis 1. And in one case it is using neither definition. In other words, this is a lot more screwed up than at first it appears. This is not just a simple ambiguity. The OOXML definition of date basis 1 is totally wrong.

That is just the Actual/Actual date basis. The other 4 conventions are more complicated. I encourage you to read David's write up in full to see how 3 of 5 basis conventions defined in OOXML differ from what Excel actually calculates. David also shows how he believes Excel really calculates these day count conventions, based on his extensive tests.

Now if this was just a matter of one function in Excel, just YEARFRAC(), then this would not be a big deal. But this is flaw is inherited into most financial functions in OOXML. Let's take an example at random, DISC(), which calculates the discount rate for a security, given settlement and maturity dates, as well as par and redemption values. You can read the definition of DISC() from the final DIS text here [pdf].


You don't need to be an Wall Street quantitative analyst to see some obvious problems here. First, in the formula given, the 2nd term is divided by "SM". What is SM? There is no "SM" defined here. There is a "DSM" defined, however. Is that what is meant? Let's assume so.

We can try a test calculation, using the example given in the text of OOXML:

What is the value of: DISC(DATE(2007,1,25),DATE(2007,6,15),97.975,100,1) ?

You can do the calculation given in the text, and find the return value of DISC should be 5.35037%. Plug the example into Excel and you will get the answer 5.24202%.

So Excel is off by 2% or so. Do we really care. It's just money, right?

The problem is that the function in OOXML is defined incorrectly, from the financial perspective. The discount rate is the discount from the redemption value, not the discount from the purchase price. So the first term in the formula should be (redemption-par)/redemption, not (redemption-par)/par. If you make this change, then the calculated value matches the value Excel gives.

Does anything strike you as odd in the above? Do you have a chill running down your spine? Do you have renewed sense of dread? You should, because I just illustrated another grave problem with the OOXML standard: The spreadsheet examples in OOXML are a fraud.

You might have mistakenly been reassured by these numerous examples in the spreadsheet formula, that these actually had some relationship to the standard, that they were examples of how the calculations should be done, that they were evidence of some sort of quality assurance, that they may even be of assistance to an implementor to see whether they implemented the function correctly.

But they aren't.

What would be normal practice would be to take the definitions, as given in the OOXML text, and to calculate the values according to the definition provided in the text, and then to compare the resulting values with what Excel returns. That would provide a useful double-check of the definitions in the text. But OOXML doesn't do that. The examples here are mere fluff.

The discrepancy here also indicates that no one has actually reviewed these formulas for accuracy. Errors like this are immediately evident, but only if you look. The fact that things like this have escaped the notice of Microsoft, Ecma TC45, their Project Editors, 80 NB reviews, the BRM experts, and the eagle eyes of ITTF, should make one have considerable concerns over the the sufficiency of the Fast Track review and approval process.

Let's try one more example before we wrap this up.

What is the value of: DISC(DATE(2000,1,1),DATE(2004,1,31),97.975,100,1) ?

So, using the stated formula in the OOXML text, the answer would be 0.50736%. Using the "correct" formula, in the financial sense, the answer would be 0.49708%. What does Excel say the value is? Well, 0.49627%, of course.

So with the DISC() function we found:
  1. The given text provided a formula that referred to a non-existant "SM" variable. This appears to be a cut & paste error.
  2. After accounting for that, we found that the formula was incorrect according to recognized financial standards. Securities are discounted from their redemption values, not from their purchase prices.
  3. Even correcting for the formula errors, we find that the given definition of DISC() does not match what Excel returns, due to errors and ambiguities in the day count conventions, errors that David Wheeler delves into more deeply in his report.
  4. The examples given in the standard are bogus. They are not actually examples of the defined function.
  5. Excel does not implement OOXML.

Labels:

Tuesday, May 13, 2008

Spreadsheet file format performance

I've been doing some performance timings of file format support, comparing MS Office and OpenOffice. Most of the results are as expected, but some are surprising, and one in particular is quite disappointing.

But first, a little details of my setup. All timings, done by stopwatch, were from Office 2003 and OpenOffice 2.4.0 running on Windows XP, with all current service packs and patches. The machine is a Lenova T60p, dual-core Intel 2.16 Ghz and 2 GB of RAM. I took all the standard precautions -- disk was defragmented, and test files were confirmed as defragmented using contig. No other applications were running and background tasks were all shut down.

For test files, I went back to an old favorite, George Ou's (at the time with ZDNet) monster 50MB XLS file from his series of tests back in 2005. This file, although very large, is very simple. There are no formulas, indeed no formatting or styles. It is just text and numbers, treating a spreadsheet like a giant data table. So tests of this file will emphasize the raw throughput of the applications. Real world spreadsheets will typically be worse than this due to additional overhead from process styles, formulas, etc.

A test of a single file is not really that interesting. We want to see trends, see patterns. So I made a set of variations on George's original file, converting it into ODF, XLS and OOXML formats, as well as making scaled down versions of it. In total I made 12 different sized subsets of the original file, ranging down to a 437KB version, and created each file in all three formats. I then tested how long it took to load each file in each of the applications. In the case of MS Office, I installed the current versions of the translators for those formats, the Compatibility Pack for OOXML, and the ODF Add-in for the ODF support.

I find it convenient to report numbers per 100,000 spreadsheet cells. You could equally well use the original XLS spreadsheet size, or the number of rows of data, or any other correlated variable as the ordinate, but values per 100K cells is simple for anyone to understand.

I'll spare you all the pretty picture. If you want to make some, here is the raw data (CSV format). But I will give some summary observations.


For document sizes, the results are as follows:
So the XML formats are far smaller than the legacy binary format. This is due to the added Zip compression that both XML formats use. Also, note that the ODF files are significantly smaller than the OOXML files, less than 1/4 the size on average. Upon further examination, the XML document representing the ODF content is larger than the corresponding XML in OOXML, as expected, due to its use of longer, more descriptive markup tags. However the ODF XML compresses far better than the OOXML version, enough to overcome its greater verbosity and result in files smaller than OOXML. The compression ratio (original/zipped) for ODF's content.xml is 87, whereas the compression ratio for OOXML's sheet1.xml is only 12. We could just mumble something about entropy and walk away, but I think this area could bear further investigation.

Any ideas?

For load time, the times for processing the binary XLS files were:
Not too surprising. These binary formats are optimized for the guts of MS Office. We would expect them to load faster in their native application.

So what about the new XML formats? There has been recent talk about the "Angle Bracket Tax" for XML formats. How bad is it?
For typical sized documents, you probably will not notice the difference. However with the largest documents, like the 16-page, 3-million cells monster sheet, the OOXML document took 40 seconds to load in Office, the ODF sheet took 90 seconds to load in OpenOffice, whereas the XLS binary took less than 2 seconds to load in MS Office.

OK. So what are we missing. Ah, yes, ODF format in MS Office, using their ODF Add-in.
Yup. You read that right. To put this in perspective, let's look at a single test file, a 600K cells file, as we load it in the various formats and editors:

Can someone explain to me why Microsoft Office needs almost 10 minutes to load an ODF file that OpenOffice can load in 14 seconds?

(I was not able to test files larger than this using the ODF Add-in since they all crashed .)

(Update: Since it is the question everyone wants to know, the beta version of OpenOffice 3.0 opens the OOXML version of that file in 49.4 seconds and Sun's ODF Plugin for Microsoft Office loads this file in 30.03 seconds. )

This is one reason why I think file format translation is a poor engineering approach to interoperability. When OpenOffice wants to read an legacy XLS file, it does not approach the problem by translating the XLS into an ODF document and then loading the ODF file. Instead they simply load the XLS file, via a file filter, into the internal memory model of OpenOffice.

What is a file filter? It is like 1/2 of a translator. Instead of translating from one disk format to another disk format, it simply loads the disk format and maps it into an application-specific memory model that the application logic can operate directly on. This is far more efficient than translation. This is the untold truth that the layperson does not know. But this is how everyone does it. That is how we support formats in SmartSuite. That is how OpenOffice does it. And that is how MS Office does it for the file formats they care about. In fact, that is the way that Novell is now doing it now, since they discovered that the Microsoft approach is doomed to performance hell.

So it is with some amusement that I watch Microsoft and others propose translation as a solution to interoperability, creating reports about translation, even a proposal for a new work item in JTC1/SC34 concerning file format translation, when the single concrete attempt at translation is such an abysmal failure. It may look great on paper, but it is an engineering disaster. What customers need is direct, internal support for ODF in MS Office, via native code, in a file filter, not a translator that takes 10 minutes to load a file.

The astute engineer will agree with the above, but will also feel some discomfort at the numbers. There is more here than can be explained simply by the use of translators versus import filters. That choice might explain a 2x difference in performance. A particularly poor implementation might explain a 5x difference. But none of this explains why MS Office is almost 40x slower in processing ODF files. Being that much slower is hard to do accidentally. Other forces must be at play.

Any ideas?

Labels: , ,

Thursday, May 08, 2008

Berry Good, Berry Bad

It has been an interesting Spring here in Westford, weather-wise. April dipped below freezing on the 3rd, 8th, 9th, 15th and 16th. Then we got a warm spell, a week of days that reached 75 °F (23 °C) and even one day that reached 87.4 °F (30.8 °C) (April 23rd). Then it struck, on May 1st, an overnight low of 28.7 °F (-1.8 °C).

The vulnerability, when a late frost like this occurs, is in bud development. If the plant, by warm sunny days, has been tricked into bud development, and then a freeze occurs, the bud will be injured or killed. Strawberries are particularly prone to this problem.

Because of interactions of thermal inversions at the ground, humidity levels, etc., a simple temperature reading is not an accurate indicator of whether damage actually occurred. For example, if humidity is high, the temperature can dip to freezing, but in the act of freezing water vapor (creating frost) energy is released (latent heat of fusion, as the chemists call it). So you have a few degrees of tolerance if humidity is high, if you have a fog, etc. In fact, commercial strawberry growers will handle this problem by running sprinklers when a freeze threatens, to increase the amount of water around the plants available to freeze, as a buffer to protect the plants. Every degree helps.

But I wasn't so lucky. The extent of damage was not clear until the strawberry plants started blooming this week. Here's what I am seeing.





Above is an example of a normal, healthy strawberry blossom. You see the ring of stamens, the male organs of the flower, each with a filiament stalk tipped with an anther containing the pollen. In the center is the receptacle with the many carpels, which are the female side of the equation.

But in the picture below, we see a blossom from my garden that shows injury. Although the plant is sound, and it did flower, the carpels are dead. This blossom will not yield a berry.

From the looks of it, 40-50% of buds are damaged in this way. So no strawberry wine this year. I'll only have enough for fresh eating and ice cream.



Labels:

Wednesday, May 07, 2008

Achieving the impossible



Unadulterated copy of James Clark's Relax NG validator jing. Unadulterated copy of Kohsuke Kawaguchi's Sun Multi-Schema Validator msv. Unadulterated copy of the ODF 1.0 Relax NG schema. Unadulterated copy of the ODF 1.0 Standard, in ODF format.

No errors from either validator.

msv is so good as to tell us "the document is valid". But jing indicates success with only silence. So will I.

Labels:

This page is powered by Blogger. Isn't yours?