• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for OOXML

OOXML

Sed quis custodiet ipsos custodes?

2008/07/22 By Rob 18 Comments

We are coming down to the last week for JTC1 to decide on whether to hear the four NB appeals concerning various claimed errors in the processing of DIS 29500 (OOXML), or whether summarily to dismiss these appeals without hearing them. The decision lies with two committees, the Technical Management Board (TMB) in ISO and the Standards Management Board (SMB) in the IEC.

Back on July 4th, the Secretaries General of ISO and the IEC referred the four NB appeals, with their comments, to the TMB/SMB. Groklaw has the text of these comments, in PDF format, as well as HTML transcription.

The comments of the Secretary General are accompanied by a ballot, asking the question:

ACTION

The members of the Technical Management Board are invited to indicate, by replying yes, no or
abstention on EITHER a) OR b) for each of the four appeals (see item 14 in annex A):

a) not to process the appeal any further:
Item 1 ABNT
Item 2 BIS
Item 3 FONDONORMA
Item 4 SABS

OR

b) to process one or more of the appeals, which would require setting up of a conciliation panel

Item 5 ABNT
Item 6 BIS
Item 7 FONDONORMA
Item 8 SABS

by no later than 4 August 2008.

This is quite a strange animal to see. Why are we having a ballot at all, and only a 30-day one? This is questionable from several perspectives.

First, why are the Secretaries General the ones calling for a ballot? The Directives do not call for them to do so. In fact the Secretaries General are not even called upon to make a recommendation. They are only asked for comments. The Directives say:

11.3.3 The Secretaries-General shall, following whatever consultations they deem appropriate, refer the appeal together with their comments to the TMB/SMB within one month after receipt of the appeal.

11.3.4 The TMB/SMB shall decide whether an appeal shall be further processed or not. If the decision is in favour of proceeding, the Chairmen of the TMB/SMB shall form a conciliation panel (see 9.2).

But deciding is not the same as voting. One of the cardinal principles of JTC1 is to discuss and seek consensus, not rush to a vote. Indeed, this is one of the matters under appeal, the rush to voting at the OOXML BRM. JTC1 Directives, section 1.2 says (my emphasis):

These Directives are inspired by the principle that the objective in the development of International Standards should be the achievement of consensus between those concerned rather than a decision based on counting votes.

But here we are, with a vote pushed on the TMB/SMB.

The sense of the vote is wrong as well. The Directives call for a decision on “whether an appeal shall be further processed or not.” Note the wording. It did not call for a decision on “whether to accept the recommendation of the Secretaries General”. But somehow, we skip discussion, skip over consensus and get a ballot question which asks the opposite question first “not to process the appeal any further”. In an environment where many parties automatically vote Yes to the ballot question, changing the sense of the question in this way is prejudicial to the appellants.

So it is clear from the start that the powers that be do not want to give these four NB’s the opportunity to make their case or be heard. In any case, let’s take a deeper look at some of the subjects under appeal and see if we can detect what it is exactly that cannot bear the scrutiny of a duly processed appeal.

First up is the alleged mishandling of the contradiction period last year. The Secretaries General dismiss this complaint, saying that it was a matter of judgment:

The Directives give the JTC 1 Secretariat and ITTF latitude to use judgement as to whether a meeting should be organized to address alleged contradictions. Considering that other issues could potentially be identified during the DIS ballot, the JTC 1 secretariat and ITTF concluded that it was preferable to initiate the ballot and to allow all issues to be addressed by the BRM. The NBs were fully informed of all the claimed contradictions and Ecma’s responses to them.

This argument doesn’t hold water. Although the JTC1 Secretariat and ITTF are allowed judgment, this is not an absolute license which cannot be questioned. The Secretariat and ITTF also have defined duties, and their actions or inactions with respect to these duties can be questioned and are subject to appeal. Specifically, an NB may appeal the issue of an inaction of JTC1, according to JTC1 Directives, 11.3. So for the Secretaries General to suggest that this inaction cannot be appealed because it is a matter of judgment is nonsense. Judgment and duty are the proper matters for an appeal.

So what is the duty in this case? As stated in JTC1 Directives, 13.4:

If a contradiction is alleged, the JTC 1 Secretariat and ITTF shall make a best effort to resolve the matter in no more than a three month period, consulting with the proposer of the fast-track document, the NB(s) raising the claim of contradiction and others, as they deem necessary. A meeting of these parties, open to all NBs, may be convened by the JTC 1 Secretariat, if required.

If the resolution requires a change to the document submitted for fast-track processing, the initial document submitted will be considered withdrawn. The proposer may submit a revised document, to be processed as a new proposal.

If the resolution results in no change to the document or if a resolution cannot be reached, the five month fast-track ballot commences immediately after such a determination is made.

The Directives call for the JTC1 Secretariat to make a best effort to resolve the matter (JTC1 Directives, 13.4). The JTC1 Secretariat is not given latitude to do nothing, or allowed discretion to immediately defer this question to the ballot period, without making a best effort to resolve the matter.

When a new 6,000 page DIS is submitted to JTC1 only one month after the publication of another standard (ODF) in the exact same space (XML document formats for office applications) and 19 NB’s submit contradiction statements, and the JTC1 Secretariat’s “best effort” is to hold no consultations with the NB’s claiming contradictions, to hold no meeting, to make no attempt to resolve the question, then I believe that any NB would has a legitimate grounds for appeal on the inaction of JTC1 with regards to contradictions. There is no evidence that a “best effort” was made here to resolve the contradictions. Doing nothing is clearly incompatible with the required “best effort”.

It should be noted that JTC1 has had challenges in the past getting ITTF to carry out their responsibilities with respect to contradictions, which lead to this resolution adopted unamimously at the 2000 JTC1 Plenary:

Resolution 27 – Consistency of JTC 1 Products

JTC 1 stresses the strong need for consistency of its products (ISs and TRs) irrespective of the route through which they were developed. Any inconsistency will confuse users of JTC 1 standards and, hence, jeopardize JTC 1’s reputation. Therefore, referring to clauses 13.2 (Fast Track) and 18.4.3.2 (PAS) of its Directives, JTC 1 reminds ITTF of its obligation to ascertain that a proposed DIS contains no evident contradiction with other ISO/IEC standards. JTC 1 offers any help to ITTF in such undertaking. However, should an inconsistency be detected at any point in the ratification process, JTC 1 together with ITTF will take immediate action to cure the problem.

Perhaps it is time to give ITTF another reminder of their obligations in this regard?

Further, the determination claimed to have been made by the JTC1 Secretariat and ITTF was not communicated to JTC1 NB’s. Instead, the JTC1 Secretariat merely forwarded Ecma’s responses to the contradiction submissions along with a notification that the DIS ballot should then commence. No statement was made as to whether the ballot was commencing because the contradictions had in fact been resolved, or because a resolution could not be made, which are the only two outcomes allowed by the Directives in 13.4. Not to notify NB’s of the actual state of the resolution of the contradictions submissions is incompatible with the JTC1 Secretariat’s duty to make a best effort to resolve the matter.

This failure by JTC1 materially effected the ensuing ballot, since Microsoft was then able to take advantage of this procedural nonperformance and repeatedly represent to NB’s that the contradictions had been rejected as invalid and could not be considered in the DIS ballot. In fact, this led to several NB’s issuing explicit, but erroneous instructions to their members that the contradictions had been resolved and thus could not be raised again as a criterion for determining their national position, e.g., in Australia.

Further, although the Secretaries General claim that “the JTC 1 secretariat and ITTF concluded that it was preferable to initiate the ballot and to allow all issues to be addressed by the BRM” the documented fact is that the BRM Convenor explicitly disallowed any discussion of contradictions at the BRM.

Another subject of appeal was the irregular voting procedures used at the DIS 29500 BRM in February. This is the P-member versus O-member question. The Secretaries General dismiss this appeal in this way:

2e. Correct but inapplicable. The BRM was neither a meeting of JTC 1 nor of SC 34 but was open to all 87 national bodies which submitted a vote (including abstentions) on the DIS. Applying 9.1.4 would have disenfranchised the voting NBs present at the BRM which were not P-members. The fact that any votes in the BRM would be open to all national delegations present was communicated over three months prior to the BRM.

This argument presented is flawed, and amounts to saying, “The voting was done by P- and O-members because the meeting was attended by delegations from P- and O-members”. Who attended the meeting is immaterial. Liaisons such as Ecma also attended the BRM? Should they have been able to vote merely because they attended? No, of course not. Voting rights are defined in JTC1 Directives, and this must not be set aside in favor of an ad-hoc rule made without NB consultation or approval.

Asserting that applying 9.1.4 would disenfranchise NB’s is an example of circular reasoning. One can only be disenfranchised if one first has the right to vote. So the statement by the Secretaries General is arguing a conclusion (O-members are permitted to vote at BRM’s) by assuming the very thing it tries to prove.

JTC1 Directives 14.4.3.9, which defines the parallel BRM process for the Publicly Available Specification (PAS) transposition process, reads: “At the ballot resolution group meeting, decisions should be reached preferably by consensus. If a vote is unavoidable, the approval criteria in the subclause 9.1.4 is applied.” So here we see 9.1.4 explicitly called for. By the argument put forth by the Secretaries General, all PAS BRM’s which follow the Directives are also flawed because they “disenfranchise” those NB’s who are not P-members of JTC1. I believe this is a tortured reading of the Directives. The voting rules of 9.1.4 are explicitly and unambiguously called for in PAS BRM’s, so one cannot dismiss their application to Fast Track on general principles that would apply equally to PAS. When Fast Track rules say that the BRM vote shall (“if a vote is unavoidable”) “be taken according to normal JTC 1 procedures” then we are faced with two alternatives:

  • Use the voting rules of 9.1.4, which declares itself to be the normal voting procedures (“In a meeting, except as otherwise specified in these directives, questions are decided by a majority of the votes cast at the meeting by P-members expressing either approval or disapproval.”)
  • Or use a voting rule which is not to be found anywhere within the Directives.

Which one is “normal JTC 1 procedures”? Where is the basis in the Directives for believing that O-members had the right to vote at the BRM?

Finally, neither BRM Convenor, Alex Brown, nor ITTF, nor indeed the assembled delegations at the BRM were competent nor had the mandate to make or change voting rules for a DIS BRM. The rules are set in JTC1 Directives, and must be followed. “These Directives shall be complied with in all respects and no deviations can be made without the consent of the Secretaries-General.” (JTC1 Directives 1.2).

Notifications made by the BRM Convenor in advance of the BRM have no weight on matters which exceed his mandate and authority. The communication referred to by the Secretaries General, which was given in advance by the BRM Convenor, was from this FAQ:

6.8 If votes are taken during the BRM, who votes?

Those present.

This in fact was not the rule applied at the BRM. For example, Liaison representatives could not vote, though they were undoubtedly present at the BRM and participated fully in other ways. Also individual participants could not vote, only delegations, via their HoD could vote. So the Convenor’s glib communication should not be taken as notification of a novel voting procedure.

Additionally, the BRM Convenor was unambiguous in his communications on his blog where he clearly stated that the voting rules of 9.1.4 would be applied:

…Now, paper balloting follows normal JTC 1 in-meeting rules: In a meeting, except as otherwise specified in these directives, questions are decided by a majority of the votes cast at the meeting by P-members expressing either approval or disapproval. (9.1.4)

(After the BRM the Convenor dutifully went back and “corrected” his earlier blog post to reflect how the BRM actually operated.)

The Secretaries General further dismiss the concerns regarding BRM voting procedure, saying:

4e. Not correct. Decisions on the comments not discussed during the BRM and proposed dispositions were taken by a process agreed by the BRM itself (29 votes in favour, none against and 2 abstentions).

On the contrary, the BRM was not competent and had not the mandate to set its own voting rules or to negate the provisions for consensus stated in JTC1 Directives 1.2:

These Directives are inspired by the principle that the objective in the development of International Standards should be the achievement of consensus between those concerned rather than a decision based on counting votes.

[Note: Consensus is defined as general agreement, characterised by the absence of sustained opposition to substantial issues by any important part of the concerned interests and by a process that involves seeking to take into account the views of all parties concerned and to reconcile any conflicting arguments. Consensus need not imply unanimity. (ISO/IEC Guide 2:1996)]

The Directives specify the rules. If NB’s do not like the rules, then NB’s may work with SWG-Directives to define new rules and then vote on them using the defined process. But if the rules are not applied correctly, then the proper course is for NB’s to appeal against the actions or inactions of those with a duty to carry out the rules. This is the essential governance model of JTC1. NB’s rule, but they rule through the rules. We may not merely decide by majority vote to ignore rules for this DIS, or to institute new rules for that DIS, or to substitute different rules for another DIS, in an ad-hoc fashion, based on a BRM vote.

Using the logic given by the Secretaries General, what in principle would prevent a BRM from voting itself an Augur in addition to a Convenor for the purpose of observing the flights of birds to decide whether a given change to the DIS text was auspicious or not? Is there any voting procedure that would not be permitted them once we say that a BRM, by majority vote, can institute their own voting rules? Are TMB/SMB certain that this is the principle that they want affirmed by their rejection of the NB’s appeals?

Further, NB’s were not duly notified that their BRM delegations would be determining their own voting rules, so few if any of them had NB instructions on that matter. An agreement among BRM HoD’s to set aside cardinal principles of JTC1, in the absence of NB consultations, should not be allowed to stand.

Finally, the existence of a vote at the BRM is not incompatible with the assertion that the BRM was “too short, arbitrarily short, or otherwise incorrectly conducted”. When given the choice between several bad alternatives, the delegations made a choice. That does not legitimatize the flawed application of JTC1 process that incorrectly gave them only bad choices and forced upon them a vote which they did not have the mandate to hold.

I could go on and on, but I’ll spare you all more of the same. I am sorry to report that I find the response by the Secretaries General to be perfunctory, poorly reasoned and self-serving. It does not serve to resolve the issues, including important issues where clarification is needed. Majority rule, within the rules, should be encouraged. But to dismiss legitimate complaints by arguing that the majority agreed to not follow the rules, this is to substitute mob rule (or orchestrated monopoly rule) for the rule of law. We know where that leads to — curtailed rights for those with minority opinions. And that should concern everyone.

The Secretary General of ISO, Alan Bryden, retires at the end of the year. August vacation is approaching, and before you know it there will be a retirement party with the cake and gifts, maybe a wall plaque or pewter paperweight. I am sure he does not need or desire to spend more time being reminded of the OOXML disaster that occurred during his last year at ISO. TMB/SMB members all want vacation as well. So do I. But out of respect for Mr. Bryden’s eventual successor, and our shared mission in JTC1, shouldn’t we urge TMB/SMB to do their job and not leave this all unresolved for the next guy to deal with? Dismissing an appeal with so many open unresolved issue is not expediency. It is merely creating more dissent, more distrust and more trouble that we’ll all need to deal with next time around. It is better, I think, to hear the appeals, get to the bottom of this, seek resolution, consensus and closure, and then to move on. Ignoring mistakes will not make them go away.

  • Tweet

Filed Under: OOXML

What is Rick smoking?

2008/07/17 By Rob 11 Comments

Former Microsoft consultant Rick Jelliffe has posted his own particular brand of science fiction/fantasy, this time in his favorite subgenre, a parody of a drug-induced psychosis, where after uneasy slumber Rick awakes in some alternate parallel universe and finds that JTC1/SC34 is open and transparent and OASIS is closed, and decides to write a rambling blog post about it.

If you like unintentional humor, you will enjoy reading Rick’s over-the-top post.

Rick suggests that organizationally JTC1/SC34 is a more participatory environment for developing standards than OASIS.

JTC1’s process, based on National Body voting is both effective … and more genuinely open, because it is impossible to stack either directly or indirecty.

Let’s test that proposition. Let’s compare OASIS and JTC1/SC34.

Who can participate? In OASIS, anyone can participate, from any company, organization, government agency, non-profit corporation in the world. Or you can join as an unaffiliated individual, as many have. You don’t need your government’s permission to join. You just do it. Most join with a nominal membership fee ($300 for individuals) but membership grants are available in some cases, when the fee would be burden for active individual contributors.

What about participation in JTC1/SC34? First, you must be a member of your NB. How do you become a member of your NB? In the US the price is $1,200 and you must be representing a company or organization. Individuals? Sorry, you are not allowed to participate. In other countries the rules vary. In some cases membership is not available at all at any price. You are essentially wait-listed until an opening becomes available. (Sorry, we don’t have enough seats, we heard in Portugal). In some countries, like China, membership is forbidden to native citizens who are employees of foreign subsidiaries in China. In other countries you can’t join at all. It is entirely a government decision. So, good luck joining the NB of Syria, where the constitution has been suspended under emergency rule since 1963. (But somehow they managed to make time to vote on the OOXML ballot. Zimbabwe as well, that paragon of open participation.)

Now, it is entirely possible for a standards organization to appear open, but in practice to be inaccessible. So we must look at the complete cost of participation, not just the initial membership fees.

The OASIS ODF TC does its work entirely on an email list, a wiki, and via weekly phone calls, which are toll-free calls for most participants. I don’t recall there ever being a face-to-face meeting, certainly not so long as I’ve been a member. This use of technology lowers the barrier to participation, so anyone can be effective on the TC if they wish. In particular it makes it easier for those who have day jobs and can only contribute to the mailing list during non-work hours.

What about JTC1/SC34? To participate effectively requires attendance at several international meetings each year (Plenary’s, WG’s, Ad-hocs, BRM’s, etc.), as well as participation at NB meetings. Since many of the participants are representative of large corporations or government agencies, a junket mentality prevails and the meetings are often held in some of the most expensive places in the world: Geneva, Granada, London, Kyoto, Jeju Island, etc.

JTC1 does not allow meeting participation by telephone. Since important votes, are held at these meetings, and no provision is made for remote participation, one cannot effectively participate in JTC1/SC34 without a substantial budget for international travel. Attendance at a single meeting — the DIS 29500 BRM — was $3687.52 for me, and I flew coach and ate cheap. How many standards meetings like that can you as an individual or your small company afford per year?

Further, note the nature of your membership — what can you actually do? Can you vote? In OASIS, it is one person/one vote. In the TC, your vote as an individual with a $300 membership fee is counted exactly the same as my vote representing an OASIS Foundational Sponsor. At the organizational level, it is one company/one vote, and the smallest OASIS member organization has exactly the same vote as the largest.

In JTC1/SC34 however, you typically can’t vote at all. NB’s vote, not individuals, not companies. So your opinion and your wishes are subject to the will of your NB. If your opinion varies from your NB’s, you may not be accredited to attend an international meeting, and even if you are able to attend you may not be allowed to speak your opinions. This extra level of indirection and censorship means that you, as an individual, can do little. And to the extent your NB’s committee is stacked by a single vendor and their partner community, or your NB decides to overrule or ignore its technical committee, or Microsoft calls your head of state to change the NB’s vote, or any of the dozens of other documented shenanigans that recently occurred, your entire membership fee and participation will be an entire waste of time, money and effort.

Membership is OASIS is far more open and inclusive. You join. You discuss. You vote. Period. In JTC1/SC34, you are mired in layers of bureaucracy at the national and international level, in a system crafted by and for the big boys to cut back room deals and manipulate the process to the benefit of large corporations.

(Now that isn’t to say that there are not some individual consultants out there who thrive in the JTC1 environment by mastering its dark, dusty, demon-haunted hallways. Even the largest corporations occasionally have need of this expertise, as Rick and others are quite aware. If JTC1/SC34 were truly open and transparent, such skills would not be needed. You certainly don’t see anyone selling their services to help companies navigate OASIS, do you?)

What about transparency? As Rick demonstrates, OASIS meeting minutes and agenda are all posted and public. So is our mailing list. So are all of our drafts. So is our member and public comments.

But in JTC1/SC34, most of the documents are private, only accessible to SC34 members by password. And then occasionally JTC1 will step in prevent SC34 from releasing their own work , suppressing documents even from their own SC members. There are no public comments to speak of, and member comments on draft standards are secret.

So when you are back from your “trip”, Rick, please let us know again, who wins on openness, participation and transparency?


And for the record, a couple of outright deceptions in Rick’s post:

  • Rick says that there are 80 NB’s, and thousands people participating in JTC1, but only 13 people participating on the ODF TC. This is a particularly inept comparison. Why is he comparing all of JTC1 to a single OASIS TC? If you look at OASIS overall, you will see that OASIS has more than 5,000 participants, representing over 600 organizations and individual members in 100 countries. The ODF TC itself has 53 members, including 7 members of JTC1/SC34.
  • Rick picks a “random” ODF TC minutes post from a year ago to attempt to suggest domination by a single company. Not so random a choice, methinks. It was a rare joint meeting of the ODF TC and the Metadata subcommittee, which brought in a far greater number of Sun employees than typically participate in a call.
  • Tweet

Filed Under: FUD, OASIS, ODF, OOXML

Toy Soldiers

2008/07/16 By Rob 3 Comments

It is interesting to watch the activities of JTC1/SC34 as they go through the motions of processing activities related to OOXML, long after any serious justification for their continuation has ceased. That is the nature of bureaucracy — wind up their clockwork and watch the little soldiers go through their prescribed motions. Come back in an hour and they may be stuck in a corner or knocked over onto the floor. But they’ll keep on shuffling their feet, back and forth, in small steps toward ends unknown and unknowable, the little senseless mechanical men.

One example is the proposals in SC34 to create a new project to create a Technical Report on translating between ODF 1.0 and OOXML 1.0. This might have made sense at some point in the past. But this proposal seems out of place now.

Consider:

  1. Few applications today support exclusively ODF 1.0 and only ODF 1.0. Most of the major vendors also support ODF 1.1, one (OpenOffice 3.x), now supports draft ODF 1.2 as well.
  2. No one supports OOXML 1.0 today, not even Microsoft.
  3. No one supports interoperability via translation, not Sun in their Plugin, not Novell in their OOXML support, and not Microsoft in their announced ODF support in Office 2007 SP2.

So the proposal essentially will be to create an technical report for a translation task that no vendor is implementing, between versions of the ODF and OOXML standards that no vendor is supporting.

Excuse me if my enthusiasm is muted.

And yes, the proposers want accelerated processing for this proposal. But the idea was already obsolete the day it was proposed to SC34. Events have overtaken it, though the clockwork motions continue, and SC34 is currently having a ballot for this proposal, ending on 29 July. I’m not in favor of it. Perhaps it would be worth considering if resubmitted in one year’s time, and was targeted to consider ODF 1.2 and OOXML 1.1 (or whatever their next version is). But is it really a priority for SC34 now?

Another example of working on autopilot is the ad-hoc working group in SC34 looking at OOXML maintenance. Although it was heralded with much pomp “SC takes control of OOXML”, the fact is SC34 currently can’t even look at OOXML, let alone maintain it. They are entirely impotent. But still they will go through the motions and meet next week in London to advise Alex Brown, who will then take all this advice and later formulate and write up his OOXML maintenance plan for SC34 to vote on.

All the best to them. They voted on OOXML without seeing it. Now they’ll determine how to maintain it without seeing it. Maybe ISO should stand for Invisible Standards Organization? Maybe one of the participants can let me know where can I submit my invisible defect report?

In any case, since Microsoft has effective voting control of SC34, after almost two years of packing the committee, my bet is that OOXML will effectively be handed over to Ecma for maintenance. That is what JTC1 has done for every other Ecma Fast Track that has been approved. They might call it a “maintenance group” and allow token participation from SC34 liaisons in a non-voting capacity, but in all important ways it will remain Microsoft/Ecma standard. In the end, this makes some sense. Who is better positioned to clarify exactly how Excel financial functions work, the Microsoft engineer who has access to the Excel source code, or an SC34 representative from Kazakhstan?

Given the leisure to do the job right, my bet is on Microsoft. Everyone knows it for what it is now. There is no longer need for elaborate attempts to disguise the fact that OOXML is and will remain a Microsoft-only standard. Why continue the charade? If Microsoft put OOXML on MSDN, at least we would all have access to it and would know where to send our defect reports to, which is more than we can say about ISO OOXML. A real open standard is preferred, of course. But given a choice of fake ISO standard and a real MSDN specification, I’ll take the real MSDN specification any day.

  • Tweet

Filed Under: OOXML

Fractured YEARFRAC and Discounted DISC

2008/05/19 By Rob 48 Comments

There are a few things that a spreadsheet can get wrong. You can be hard to use. You can be ugly. You can be slow. You can even crash occasionally. These are all annoyances. With a monopoly, annoyances are a way of life. However, there is one place where a spreadsheet can never be wrong, and that is in the area of calculations, especially financial calculations. A business uses spreadsheets to model and track its business. It makes decisions based on these spreadsheets. If the spreadsheet gives the wrong answer, then the business makes the wrong decision. And a business that makes the wrong decisions does not stay in business for long.

So it was with much lament last year that I reported that OOXML, then under ballot in ISO/IEC JTC, had many egregious errors in its spreadsheet formula definitions. In addition to enumerating these errors in my blog, I submitted them for consideration by INCITS V1, the US SC34 mirror committee, and these became part of the bundle of comments with which the US accompanied its ballot.

Although the ballot wasn’t due until September 2nd, the extravagant “sit and do nothing” provisions in INCITS lead to our technical review being cut off in July. Because of the lack of review time, I was not able to do an in-depth review of all the spreadsheet functions, but only a cursory review. But the existence of such errors as I did identify, in the already-approved Ecma Standard, was disquieting. It should have lead Ecma TC45 to conduct a more through review of the spreadsheet functions. But this did not appear to happen.

When we received the response from Ecma to the NB comments, in January, INCITS V1 was asked to go through all the responses (over 1,000 of them) to determine whether they were acceptable. This review period was again insufficient.

As for the BRM in February, this was a travesty, as I and others have noted.

I have a theory concerning committees. A committee may have different states, like water has gas, liquid or solid phases, depending on temperate and pressure. The same committee, depending on external circumstances of time and pressure will enter well-defined states that determine its effectiveness. If a committee works in a deliberate mode, where issues are freely discussed, objections heard, and consensus is sought, then the committee will make slow progress, but the decisions of the committee will collectively be smarter than its smartest member. However, if a committee refuses to deliberate and instead merely votes on things without discussion, then it will be as dumb as its dumbest members. Voting dulls the edge of expertise. But discussion among experts socializes that expertise. This should be obvious. If you put a bunch of smart people in a room and don’t let them think or talk, then don’t expect smart things to happen as if the mere exhalation of their breath brings forth improvements to the standard.

So the the BRM ended a different committee than it started, and the mode of operation it was lead into caused it to act like a very stupid committee indeed. I don’t say this to be accusatory. I’m just making an observation about crowd behavior. When a committee of experts ceases to be a deliberate committee, then you will achieve subpar results.

One of the ways the BRM was stupid is that it approved changes to OOXML that have totally broken SpreadsheetML’s financial calculations, rendering the resulting calculations both mathematically incorrect as well as inconsistent with what Excel actually calculates. More about this later.

Here be dragons.

One of the persistent problems with OOXML has been in the area of “day count conventions” as used in SpreadsheetML.

Why should day counting be complicated? Just count how many days between the two dates and you are done, right? Indeed, everywhere but in finance it is simple. Some of the complications are for historical reasons, to try to make hand calculations easier in the pre-computer era. Also, these conventions made the calendar more regular, so financial instruments were less distorted by calendar irregularities like leap years or variable length months. If you assume a year is exactly 360 days in 12 months with 30 days each, then some things in life are simpler. Of course, this makes other things in life more complicated, including defining spreadsheet functions.

The devil is in the details. You must get these count conventions right. Being wrong by just one day in a year may only be 0.3%, but in a million dollar transaction, that is $3000. Not many people can afford to routinely ignore a $3000 error.

Different financial organizations have developed their own different day count conventions. We have the Banker’s Rule, the LIBOR rule, the NASD rule, the ISDA rule, ICMA rule, SIFMA rule, etc.

In Excel, these conventions are defined by a function parameter called the “basis” and this basis is used by many financial spreadsheet functions. In fact, none of the following functions are adequately defined unless the day count basis values are adequately defined:

  • ACCRINT()
  • ACCRINTM()
  • AMORDEGRC
  • AMORLINC
  • COUPDAYBS()
  • COUPDAYS()
  • COUPDAYSNC()
  • COUPNCD()
  • COUPNUM()
  • COUPPCD()
  • DISC()
  • DURATION()
  • INTRATE
  • MDURATION()
  • ODDFPRICE()
  • ODDFYIELD()
  • ODDLPRICE()
  • ODDLYIELD()
  • PRICE()
  • PRICEDISC()
  • PRICEMAT()
  • RECEIVED()
  • YEARFRAC()
  • YIELD()
  • YIELDDISC()
  • YIELDMAT()

There are five basis conventions defined in OOXML, with values 0-4:

  • Basis 0 = US 30/360
  • Basis 1 = Actual/Actual
  • Basis 2 = Actual/360
  • Basic 3 = Actual/365
  • Basic 4 = European 30/360

In this convention X/Y indicates that X is the assumption for month length, and Y is the assumption for year length. Note that there is no external authority given as a reference for any of these conventions. So it is not clear, whether the Actual/Actual convention is following, for example ICMA rules or ISDA rules. In fact, it may be following neither rule. So lacking any external authority, all we have to go on are the definitions provided in the specification itself. OOXML is its own authority for day count conventions.

As I reported last July, the definitions provided by OOXML did not sufficiently define the behavior of the conventions that lie behind most of the financial functions in Excel. Unfortunately, Microsoft/Ecma have failed to fix this problem in their proposed Resolutions, and in fact have made it worse. Further, the DIS 29500 BRM, in its negligent bulk approval of Ecma’s responses merely advanced these serious errors into the text which was then approved as an International Standard. So essentially, my work in pointing out errors in the spreadsheet language was for naught. Microsoft just shoved it all though anyways.

This puts me in a delicate situation. On the one hand, the ODF TC really would like to finish its work on ODF 1.2, and part of that is completing the OpenFormula work. A key remaining part of OpenFormula is to ensure that our financial functions synch up with how Excel works. What OOXML says is irrelevant, except to the extent that it accurately tells how Excel defines these functions.

However, inquiries to Ecma on these day count conventions, inqueries made months ago, have received no response. Also, the final DIS text of OOXML has not been made available, not even to SC34 members and liaisons. And there is not mechanism in place yet in SC34 for collecting defect reports on OOXML. So we’re stuck.

Or maybe not.

David Wheeler, Chair of the OpenFormula subcomittee of the OASIS ODF TC, has been trying to nail down the behavior of Excel’s spreadsheet functions for over a year now. One of the last remaining pieces is to nail down the day count conventions. After waiting and waiting for this to be clarified in OOXML, David took matters into his own hands and decided to solve the problem by brute force, enumerating millions of test cases, indeed a comprehensive set of date pairs over a 6 year period, to try to determine exactly how the date bases in Excel work. You can read David’s conclusions on his blog.

What strikes me in David’s report is that not only are the OOXML definitions incomplete and inconsistent, but they do not accurately reflect what Excel actually calculates. So either Excel is wrong, or the OOXML standard is wrong when calculating almost every financial spreadsheet function. This is quite an embarrassment for an ISO standard, and an unnecessary one, since I have been talking about how poorly defined these functions are for almost a year now.

Let’s work through some of David Wheeler’s test cases by hand, to get a better feel for how OOXML is broken.

Let’s take the YEARFRAC() function as the simplest example. YEARFRAC() takes two dates and a basis value as inputs, and returns the fraction of a year that is between those two dates. So logically, the calculation is like:

YEARFRAC = (interval between start-date and end-date) / length of year.

That is the logical definition. The only complication is the 5 date bases, and what exactly the year length is. This last point is something that stuck out in my mind when I first reviewed the draft of OOXML last summer. You might think the year length is 365 days. But what about leap years? And what about date ranges that straddle normal years and leap years? This is the key fact that this function requires in is definition. This is the problem I reported on my blog last July. This was the problem INCITS V1 submitted with our ballot comments last September. This is the problem that Ecma responded to in a severely flawed way in January. This is the problem that the BRM refused to discuss and merely agreed with Ecma’s flawed changes. And this is the problem that is now in the final DIS version of OOXML.

Take a look for yourself in this brief extract from the final DIS version [pdf] of OOXML, provided for purposes of critical review and commentary.

(Yes, I now have a complete copy of the final DIS version of OOXML. If you think that this is unfair — and I would agree with you on that — then maybe you should ask ITTF why I was able to get a copy of the final DIS, but no one else in SC34 was.)

Let’s take a look at the ISO OOXML definitions and try some test calculations to reproduce some of David’s findings.

First, let’s take the date basis 1 first, Actual/Actual, since that is the easiest. But we immediately run into a problem. The standard says two different things. In the “description” table it says:

Actual/actual. The actual number of days between the two dates are counted. If the date range includes the date 29 February, the year is 366 days; otherwise it is 365 days.

However, later, when defining the return value, the standard says this:

If the Actual/actual basis is used, the year length used is the average length of the years that the range crosses, regardless of where start-date and end-date fall in their respective years.

There is absolutely no way in which these can both be correct. This would have been easily fixable at the BRM, if the BRM had been allowed to do its job. But I wasn’t even allowed to open my mouth and point out this problem. So now this fatal ambiguity sits in the text of OOXML, as authorized by the BRM experts and approved by JTC1 NB’s. Gotta love it.

But let’s forge on and assume that we really have two algorithms, the first (from the description) which we will call basis 1, and the second basis (from the return value section) which we will call 1′ (“one-prime”). We’ll calculate it both ways.

Let calculate YEARFRAC(DATE(2000,1,1),DATE(2001,1,1),1)

With basis 1, this is simple. 2000 was a leap year (since it was a century divisible by 400) so the interval between the two dates is 366 days. Similarly, since the date range includes a February 29th, the length of the year used in the calculation is 366. So the returned value returned by YEARFRAC should be 366/366 or 1.0.

With basis 1′, this is also simple. The date interval is still 366. But the length of the year is now the average of the year lengths crossed by the date range. So the average of 366 (for 2000) and 365 (for 2001), or an average of 365.5. So using basis 1′, YEARFRAC should return 366/365.5 or 1.00137.

Let’s try another, with basis 1. What is YEARFRAC(DATE(2000,1,1),DATE(2002,1,1),1) ?

In basis 1, the interval is 731 days (366 days in 2000 plus 365 days in 2001). The year length is 366, since the interval includes a February 29th. So YEARFRAC should return 731/366 = 1.997.

With basis 1′, the interval is also 731 days, but the year length is the average of 366 (for 2000), 365 (for 2001) and 365 (for 20002). So YEARFRAC should return 2.0009

Just to make sure we have this down, let’s try another example:

What is YEARFRAC(DATE(2000,1,1),DATE(2000,1,2),1) ?

A one day interval? Yes, please humor me.

OK. With basis 1, the interval is one day. Since it does not cross February 29th, the year length is 365 days. So it will return 1/365 or 0.0028.

With basis 1′ the interval is also one day, and the year length is 366, So it will return 1/366 or 0.0027

OK. That was too easy. One more example, to make sure that you have it down.

What is YEARFRAC(DATE(2000,1,1),DATE(2004,1,31),1) ?

Hmm…. this one will require more thought. I might have to take off my shoes and count using my toes as well.

With basis 1, the interval is 1491 days = 366 + 365 + 365 +365 +30.

The year lengths are:

2000=366 (since the range cross February 29th)
2001=365
2002=365
2003=365
2004=365 (since the date range does not cross February 29th)

So we have 4 full years plus 30 days of a 365 day year. YEARFRAC should return 4 + 30/365 = 4.0822.

With basis 1′ we treat the 2004 as having 366 days and average the years in the interval, so average year length = (366+365+365+365+366)/5 = 365.4. So YEARFRAC should return 4 + 30/365.4 = 4.0821.

Now that we’re done with the examples, we can throw them into a table, and compare them to what Excel 2007 calculates for these same parameters:

start-date end-date basis ISO Value Excel’s Value Excel Correct?
2000-01-01 2001-01-01 1 1.0000 1.0000 Yes
2000-01-01 2001-01-01 1′ 1.00137 1.0000 No
2000-01-01 2002-01-01 1 1.9972 2.0009 No
2000-01-01 2002-01-01 1′ 2.0009 2.0009 Yes
2000-01-01 2000-01-02 1 0.0028 0.0027 No
2000-01-01 2000-01-02 1′ 0.0027 0.0027 Yes
2000-01-01 2004-01-31 1 4.0822 4.0805 No
2000-01-01 2004-01-31 1′ 4.0821 4.0805 No

Let your mind linger on these results for a bit. Let it sink in. Look at this table until you recognize its significance and cringe in disgust. In some cases, Excel seems to be using the first definition of date basis 1. In other cases it is using the second definition of date basis 1. And in one case it is using neither definition. In other words, this is a lot more screwed up than at first it appears. This is not just a simple ambiguity. The OOXML definition of date basis 1 is totally wrong.

That is just the Actual/Actual date basis. The other 4 conventions are more complicated. I encourage you to read David’s write up in full to see how 3 of 5 basis conventions defined in OOXML differ from what Excel actually calculates. David also shows how he believes Excel really calculates these day count conventions, based on his extensive tests.

Now if this was just a matter of one function in Excel, just YEARFRAC(), then this would not be a big deal. But this is flaw is inherited into most financial functions in OOXML. Let’s take an example at random, DISC(), which calculates the discount rate for a security, given settlement and maturity dates, as well as par and redemption values. You can read the definition of DISC() from the final DIS text here [pdf].

You don’t need to be an Wall Street quantitative analyst to see some obvious problems here. First, in the formula given, the 2nd term is divided by “SM”. What is SM? There is no “SM” defined here. There is a “DSM” defined, however. Is that what is meant? Let’s assume so.

We can try a test calculation, using the example given in the text of OOXML:

What is the value of: DISC(DATE(2007,1,25),DATE(2007,6,15),97.975,100,1) ?

  • B = 365, since the date range does not include February 29th. (Note that DISC has a single definition for year length in basis 1, not the two conflicting definitions we saw in YEARFRAC)
  • DSM = 141 days = 6 in Jan+28 in Feb+31 in March+30 in April+31 in May+15 in June
  • par = 97.975, which was our input parameter
  • redemption = 100, which is another one of our input parameters

You can do the calculation given in the text, and find the return value of DISC should be 5.35037%. Plug the example into Excel and you will get the answer 5.24202%.

So Excel is off by 2% or so. Do we really care. It’s just money, right?

The problem is that the function in OOXML is defined incorrectly, from the financial perspective. The discount rate is the discount from the redemption value, not the discount from the purchase price. So the first term in the formula should be (redemption-par)/redemption, not (redemption-par)/par. If you make this change, then the calculated value matches the value Excel gives.

Does anything strike you as odd in the above? Do you have a chill running down your spine? Do you have renewed sense of dread? You should, because I just illustrated another grave problem with the OOXML standard: The spreadsheet examples in OOXML are a fraud.

You might have mistakenly been reassured by these numerous examples in the spreadsheet formula, that these actually had some relationship to the standard, that they were examples of how the calculations should be done, that they were evidence of some sort of quality assurance, that they may even be of assistance to an implementor to see whether they implemented the function correctly.

But they aren’t.

What would be normal practice would be to take the definitions, as given in the OOXML text, and to calculate the values according to the definition provided in the text, and then to compare the resulting values with what Excel returns. That would provide a useful double-check of the definitions in the text. But OOXML doesn’t do that. The examples here are mere fluff.

The discrepancy here also indicates that no one has actually reviewed these formulas for accuracy. Errors like this are immediately evident, but only if you look. The fact that things like this have escaped the notice of Microsoft, Ecma TC45, their Project Editors, 80 NB reviews, the BRM experts, and the eagle eyes of ITTF, should make one have considerable concerns over the the sufficiency of the Fast Track review and approval process.

Let’s try one more example before we wrap this up.

What is the value of: DISC(DATE(2000,1,1),DATE(2004,1,31),97.975,100,1) ?

  • B = 366, since the date range includes February 29th
  • DSM = 1491 days = 366 + 365 + 365 +365 +30.
  • par = 97.975, which was our input parameter
  • redemption = 100, which is another one of our input parameters

So, using the stated formula in the OOXML text, the answer would be 0.50736%. Using the “correct” formula, in the financial sense, the answer would be 0.49708%. What does Excel say the value is? Well, 0.49627%, of course.

So with the DISC() function we found:

  1. The given text provided a formula that referred to a non-existant “SM” variable. This appears to be a cut & paste error.
  2. After accounting for that, we found that the formula was incorrect according to recognized financial standards. Securities are discounted from their redemption values, not from their purchase prices.
  3. Even correcting for the formula errors, we find that the given definition of DISC() does not match what Excel returns, due to errors and ambiguities in the day count conventions, errors that David Wheeler delves into more deeply in his report.
  4. The examples given in the standard are bogus. They are not actually examples of the defined function.
  5. Excel does not implement OOXML.
  • Tweet

Filed Under: OOXML

Spreadsheet file format performance

2008/05/13 By Rob 19 Comments

I’ve been doing some performance timings of file format support, comparing MS Office and OpenOffice. Most of the results are as expected, but some are surprising, and one in particular is quite disappointing.

But first, a little details of my setup. All timings, done by stopwatch, were from Office 2003 and OpenOffice 2.4.0 running on Windows XP, with all current service packs and patches. The machine is a Lenova T60p, dual-core Intel 2.16 Ghz and 2 GB of RAM. I took all the standard precautions — disk was defragmented, and test files were confirmed as defragmented using contig. No other applications were running and background tasks were all shut down.

For test files, I went back to an old favorite, George Ou’s (at the time with ZDNet) monster 50MB XLS file from his series of tests back in 2005. This file, although very large, is very simple. There are no formulas, indeed no formatting or styles. It is just text and numbers, treating a spreadsheet like a giant data table. So tests of this file will emphasize the raw throughput of the applications. Real world spreadsheets will typically be worse than this due to additional overhead from process styles, formulas, etc.

A test of a single file is not really that interesting. We want to see trends, see patterns. So I made a set of variations on George’s original file, converting it into ODF, XLS and OOXML formats, as well as making scaled down versions of it. In total I made 12 different sized subsets of the original file, ranging down to a 437KB version, and created each file in all three formats. I then tested how long it took to load each file in each of the applications. In the case of MS Office, I installed the current versions of the translators for those formats, the Compatibility Pack for OOXML, and the ODF Add-in for the ODF support.

I find it convenient to report numbers per 100,000 spreadsheet cells. You could equally well use the original XLS spreadsheet size, or the number of rows of data, or any other correlated variable as the ordinate, but values per 100K cells is simple for anyone to understand.

I’ll spare you all the pretty picture. If you want to make some, here is the raw data (CSV format). But I will give some summary observations.

For document sizes, the results are as follows:

  • Binary XLS format = 1,503 KB per 100K cells
  • OOXML format = 491 KB per 100K cells
  • ODF format = 117 KB per 100K cells

So the XML formats are far smaller than the legacy binary format. This is due to the added Zip compression that both XML formats use. Also, note that the ODF files are significantly smaller than the OOXML files, less than 1/4 the size on average. Upon further examination, the XML document representing the ODF content is larger than the corresponding XML in OOXML, as expected, due to its use of longer, more descriptive markup tags. However the ODF XML compresses far better than the OOXML version, enough to overcome its greater verbosity and result in files smaller than OOXML. The compression ratio (original/zipped) for ODF’s content.xml is 87, whereas the compression ratio for OOXML’s sheet1.xml is only 12. We could just mumble something about entropy and walk away, but I think this area could bear further investigation.

Any ideas?

For load time, the times for processing the binary XLS files were:

  • Microsoft Office 2003 = 0.03 seconds per 100K cells
  • OpenOffice 2.4.0 = 0.4 seconds per 100K cells

Not too surprising. These binary formats are optimized for the guts of MS Office. We would expect them to load faster in their native application.

So what about the new XML formats? There has been recent talk about the “Angle Bracket Tax” for XML formats. How bad is it?

  • Microsoft Office 2003 with OOXML = 1.5 seconds per 100K cells
  • OpenOffice 2.4.0 with ODF = 2.7 seconds per 100K cells

For typical sized documents, you probably will not notice the difference. However with the largest documents, like the 16-page, 3-million cells monster sheet, the OOXML document took 40 seconds to load in Office, the ODF sheet took 90 seconds to load in OpenOffice, whereas the XLS binary took less than 2 seconds to load in MS Office.

OK. So what are we missing. Ah, yes, ODF format in MS Office, using their ODF Add-in.

  • Microsoft Office 2003 with ODF, using the ODF Add-in = 74.6 seconds per 100K cells

Yup. You read that right. To put this in perspective, let’s look at a single test file, a 600K cells file, as we load it in the various formats and editors:

  • Microsoft Office 2003 in XLS format = 0.75 seconds
  • OpenOffice 2.4.0 in XLS format = 3.03 seconds
  • Microsoft Office 2003 in OOXML format = 8.28 seconds
  • OpenOffice 2.4.0 in ODF format = 14.09 seconds
  • Microsoft Office 2003 in ODF format = 515.60 seconds

Can someone explain to me why Microsoft Office needs almost 10 minutes to load an ODF file that OpenOffice can load in 14 seconds?

(I was not able to test files larger than this using the ODF Add-in since they all crashed .)

(Update: Since it is the question everyone wants to know, the beta version of OpenOffice 3.0 opens the OOXML version of that file in 49.4 seconds and Sun’s ODF Plugin for Microsoft Office loads this file in 30.03 seconds. )

This is one reason why I think file format translation is a poor engineering approach to interoperability. When OpenOffice wants to read an legacy XLS file, it does not approach the problem by translating the XLS into an ODF document and then loading the ODF file. Instead they simply load the XLS file, via a file filter, into the internal memory model of OpenOffice.

What is a file filter? It is like 1/2 of a translator. Instead of translating from one disk format to another disk format, it simply loads the disk format and maps it into an application-specific memory model that the application logic can operate directly on. This is far more efficient than translation. This is the untold truth that the layperson does not know. But this is how everyone does it. That is how we support formats in SmartSuite. That is how OpenOffice does it. And that is how MS Office does it for the file formats they care about. In fact, that is the way that Novell is now doing it now, since they discovered that the Microsoft approach is doomed to performance hell.

So it is with some amusement that I watch Microsoft and others propose translation as a solution to interoperability, creating reports about translation, even a proposal for a new work item in JTC1/SC34 concerning file format translation, when the single concrete attempt at translation is such an abysmal failure. It may look great on paper, but it is an engineering disaster. What customers need is direct, internal support for ODF in MS Office, via native code, in a file filter, not a translator that takes 10 minutes to load a file.

The astute engineer will agree with the above, but will also feel some discomfort at the numbers. There is more here than can be explained simply by the use of translators versus import filters. That choice might explain a 2x difference in performance. A particularly poor implementation might explain a 5x difference. But none of this explains why MS Office is almost 40x slower in processing ODF files. Being that much slower is hard to do accidentally. Other forces must be at play.

Any ideas?

  • Tweet

Filed Under: ODF, OOXML, Performance

  • « Go to Previous Page
  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Go to page 4
  • Go to page 5
  • Interim pages omitted …
  • Go to page 23
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2023 Rob Weir · Site Policies