Archives for 2008

Release the OOXML final DIS text now !

2008/05/04 By Rob 24 Comments

The JTC1 Directives [pdf] are quite clear on this point. After a Ballot Resolution Meeting (BRM), if the text is approved, the edited, final version of the text is to be distributed to NB’s within 1 month. This requirement is in the Fast Track part of JTC1 Directives, specifically in 13.12:

13.12 The time period for post ballot activities by the respective responsible parties shall be as follows:
.
.
.

In not more than one month after the ballot resolution group meeting the SC Secretariat shall distribute the final report of the meeting and final DIS text in case of acceptance.

The OOXML BRM ended on February 29th. One month after February 29th, if my course work in scientific computing does not fail me, is… let’s see, carry the 3, multiply, convert to sidereal time, account for proper nutation of the solar mean, subtract the perihelion distance at first point of Aries, OK. Got it. Simple. One month later is approximately March 29th +/- 3 days.

So the SC34 Secretariat should have distributed the “final DIS text” by March 29th, or at the very least, when the final ballot results on OOXML were known a few days later.

But that didn’t happen. Nothing. Silence. What is the hang up? I note that when NB’s said that the Fast Track schedule did not give sufficient time to review OOXML, the response from ISO/IEC was “There is nothing we can do. The Directives only permit 5 months”. And when NB’s protested at the arbitrary 5 day length of the OOXML BRM, the response was similarly dismissive. But when Microsoft needs more time to edit OOXML, well that appears to be something entirely different. “Directives, Schmerectives. You don’t worry yourself about no stinkin’ Directives. Take whatever time you need, Sir.”

It makes you wonder who ISO/IEC bureaucracy is working for? The rights and prerogatives of NB’s? Or of large corporations? Almost every decision they made in the OOXML processing was to the the detriment of NB prerogatives.

This delay has practical implications as well. Consider the following:

We are currently approaching a two month period where NB’s can lodge an appeal against OOXML. Ordinarily, one of the grounds for appeal would be if the Project Editor did not faithfully carry out the editing instructions approved at the BRM. For example, if he failed to make approved changes, made changes that were not authorized, or introduced new errors when applying the approved changes. But with no final DIS text, the NB’s are unable to make any appeals on those grounds. By delaying the release of the final DIS text, JTC1 is preventing NB’s from exercising their rights.
Law suits, such as the recent one in the UK, are alleging process irregularities, including (if I read it correctly) that BSI approved OOXML without seeing the final text. I imagine that having the final DIS text in hand and being able to point to particular flaws in that text that should have justified disapproval would bolster their case. But if JTC1 withholds the text, then they cannot make that point as effectively.
There are obvious anti-competitive effects at play here. Microsoft has the final DIS version of the ISO/IEC 29500:2008 standard, and by JTC1 delaying release to NB’s, Microsoft is able to have 2+ extra months, free of competition, to produce a fix pack to bring their products in line with the final standard, while other competitors like Sun or Corel are left behind. So much for transparency. So much for open standards. How can this can considered open if some competitors are given a significant time and access advantage?

Note that I’m not talking about the publication of the IS here. I’m talking about the requirements of 13.12 and the release of the final DIS text. Obviously ITTF will have a lot of work to do prepping OOXML for publication. For ODF it took 6 months. For OOXML I would expect it to take at least that long. But that does not prevent adhearance to the Directives, in particular the requirement to distribute the final DIS text.

JTC1/SC34, noticing the delay in the release of this text, adopted the following Resolution at their Plenary in early April:

Resolution 8: Distribution of Final text of DIS 29500

SC 34 requests the ITTF and the SC34 secretariat to distribute the already received final text of DIS 29500 to the SC 34 members in accordance with JTC 1 directives section 13.12 as soon as possible, but not later than May 1st 2008. Access to this document is important for the success of various ISO/IEC 29500 maintenance activities.

This indicates that the final DIS text had already been received by SC34 (but not distributed) as of that date (April 9th).

Well, here we are, May 4th, over two months since the final DIS text was due, and past the date requested by the SC34 Plenary (who by they way have no authority to extend the deadline required by JTC1 Directives, but that is another story). We have nothing.

So, I’ll make my own personal appeal. JTC1 has the text. The Directives are clear. The delay is unnecessary and harmful in the ways I outlined above. Release the final DIS text now. Not next month. Not next week. Release it now.

ODF Validation for Dummies

2008/05/02 By Rob 32 Comments

[Updated 4 May 2008, with additional rebuttal at the end]

Alex Brown has a problem. He can’t figure out how to validate ODF documents. Unfortunately, when he couldn’t figure it out, he didn’t ask the OASIS ODF TC for help, which would have been the normal thing to do. Indeed, the ODF TC passed a resolution back in February 2007 that said, in part:

That the ODF TC welcomes any questions from ISO/IEC JTC1/SC34 and
member NB’s regarding OpenDocument Format, the functionality it
describes, the planned evolution of this standard, and its relationship
to other work on the technical agenda of JTC1/SC34. Questions and
comments can be directed to the TC chair and secretary whose email
addresses are given at

http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office

or through the comments facility at

http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office

So it is rather uncollegial of Alex to refuse such an open, transparent way of getting his questions answered. But Alex didn’t avail himself of that avenue. He just assumed if he couldn’t figure out how to validate ODF then it simply couldn’t be done, and that ODF was to blame. This is presumptuous. Does he think that in the three years since ODF 1.0 became a standard, that no one has tried to validate a document?

Alex is so sure of himself that he publicly exults on the claimed significance of his findings:

For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF.

Even if the schema is fixed, we can see that OpenOffice.org 2.4.0 does not produce valid XML documents. This is to be expected and is a mirror-case of what was found for MS Office 2007: while MS Office has not caught up with the ISO standard, OpenOffice has rather bypassed it (it aims at its consortium standard, just as MS Office does).

I think you agree that these are bold pronouncements, especially coming from someone so prominent in SC34, the Convenor of the ill-fated OOXML BRM, someone who is currently arguing that SC34 should own the maintenance of OOXML and ODF, indeed someone who would be well served if he could show that all consortia standards are junk, and that only SC34 (and he himself) could make them good.

Of course, I’ve been known to pontificate as well. There is nothing necessarily wrong with that. The difference here is that Alex Brown is totally wrong.

But let’s see if we can help show Alex, or anyone else similarly confused, the correct way to validate an ODF document.

First start with an ODF document. When Alex tested OOXML, he used the Ecma-376 OOXML specification. Let’s do the analogous test and validate the ODF 1.0 text. You can download it from the OASIS ODF web site. You’ll want this version of the text, ODF 1.0 (second edition), which is the source document for the ISO version of ODF.

You’ll also want to download the Relax NG schema files for OASIS ODF 1.0, which you can download in two pieces: the main schema, and the manifest schema.

Next you’ll need to get a Relax NG validator. Alex recommends James Clark’s jing, so we’ll use that. I downloaded jing-20030619.zip the main distribution for use with the Java Runtime Environment. Unzip that to a directory and we’re almost there.

Since jing operates on XML files and knows nothing about the Zip package structure of an ODF file, you’ll need to extract the XML contents of the ODF file. There are many ways to do this. My preference, on Windows, is to associate WinZip with the ODF file extensions (ODT, ODS and ODP) so I can right-click on these files unzip them. When you unzip you will have the following XML files, along with directories for images files and other non-XML resources you can ignore:

content.xml
styles.xml
meta.xml
settings.xml
META-INF/manifest.xml

So now we’re ready to validate! Let’s start with content.xml. The command line for me was:

java -jar c:/jing/bin/jing.jar OpenDocument-schema-v1.0-os.rng content.xml

(Your command may vary, depending on where you put jing, the ODF schema files and the unzipped ODF files)

The result is a whole slew of error messages:

C:\temp\odf\OpenDocument-schema-v1.0-os.rng:17658:18: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0" C:\temp\odf\OpenDocument-schema-v1.0-os.rng:10294:22: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"

Oh no! Emergency, emergency, everyone to get from street!

I wonder if this is one of the things that tripped Alex up? Take a deep breath. These in fact are not Relax NG (ISO/IEC 19757-2) errors at all, but errors generated by jing’s default validation of a different set of constraints, defined in the Relax NG DTD Compatibility specification which has the status of a Committee Specification in OASIS. It is not part of ISO/IEC 19757-2.

Relax NG DTD Compatibility provides three extensions to Relax NG: default attribute values, ID/IDREF constaints and a documentation element. The Relax NG DTD Compatibility specification is quite clear in section 2 that “Conformance is defined separately for each feature. A conformant implementation can support any combination of features.” And in fact, ODF 1.0, in section 1.2 does just that: “The schema language used within this specification is Relax-NG (see [RNG]). The attribute default value feature specified in [RNG-Compat] is used to provide attribute default values”.

It is best to simple disable the checking of Relax NG DTD Compatibility constraints by using the documented “-i” flag in jing. If you want to validate ID/IDREF cross-references, then you’ll need to do that in application code, and not using jing in Relax NG DTD Compatibility mode. Note that jing was not complaining about any actual ID/IDREF problem in the ODF document.

So, false alarm. You can walk safely on the streets now.

(That said, if we can make some simple changes to the ODF schemas that will allow it to work better with the default settings of jing, or other popular tools, then I’m certainly in favor of that. Alex’s proposed changes to the schema are reasonable and should be considered.)

So, let’s repeat the validation with the -i flag:

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng content.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng styles.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng meta.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng settings.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-manifest-schema-v1.0-os.rng META-INF/manifest.xml

Zero errors, zero warnings.

So, there you have it, an example that shows that there is at least one document in the universe that is valid to the ODF 1.0 schema, disproving Alex’s statement that “there are no XML documents in existence which are valid to ISO ODF.”

The directions are complete and should allow anyone to validate the ODF 1.0 specification, or any other ODF 1.0 document. Now that we have the basics down, let’s work on some more advanced topics.

First, the reader should note that there are two versions of the ODF schema, the original 1.0 from 2005, and the updated 1.1 from 2007. (This is also a third version underway, ODF 1.2, but that needn’t concern us here.)

An application, when it creates an ODF document, indicates which version of the ODF standard it is targeting. You can find this indication if you look at the office:version attribute on the root element of any ODF XML file. The only values I would expect to see in use today would be “1.0” and “1.1”. Eventually we’ll also see “1.2”.

It is important to use the appropriate version of the ODF schema to validate a particular document. Our goal, as we evolve ODF, is that an application that knows only about ODF 1.0 should be able to adapt and “degrade gracefully” when given an ODF 1.1 document, by ignoring the features it does not understand. But an application written to understand ODF 1.1 should be able to fully understand ODF 1.0 documents without any additional accommodation.

Put differently, from the document perspective, a document that conforms to ODF 1.0 should also conform to ODF 1.1. But the reverse direction is not true.

To accomplish this, as we evolve ODF, within the 1.x family of revisions, we try to limit ourselves to changes that widen the schema constraints, by adding new optional elements, or new attribute values, or expanding the range of values permitted. Constraint changes that are logically narrowing, like removing elements, making optional elements mandatory, or reducing the range of allowed values, would break this kind of document compatibility.

Now of course, at some point we may want to make bolder changes to the schema, but this would be in a major release, like a 2.0 version. But within the ODF 1.x family we want this kind of compatibility.

The net of this is, an ODF 1.1 document should only be expected to be valid to the ODF 1.1 schema, but an ODF 1.0 document should be valid to the ODF 1.0 and the ODF 1.1 schemas.

That’s enough theory! Let’s take a look now at the test that Alex actually ran. It is a rather curious, strangely biased kind of test, but the bad thinking is interesting enough to devote some time to examine in some detail.

When he earlier tested OOXML, Alex used the OOXML standard itself, a text on which Microsoft engineers had lavished many person-years of attention for the past 18 months, and he validated it with the current version of the OOXML schema. That is pretty much the best case, testing a document that has never been out of Microsoft’s sight for 18 months and testing it with the current version of the schema. I would expect that this document would have been a regular test case for Microsoft internally, and that its validity has been repeatedly and exhaustively tested over the past 18 months. I know that I personally tested it when Ecma-376 was first released, since it was the only significant OOXML document around. So, essentially Alex gave OOXML the softest of all soft pitches.

I think Microsoft’s response, that the validity errors detected by Alex are due to changes made to the schema at the BRM, is a reasonable and accurate explanation. The real story on OOXML standardization is not how many changes were made that were incompatible with Office 2007, but how few. It appears that very few changes, perhaps only one, will be required to make Office 2007’s output be valid OOXML.

So when testing ODF, what did Alex do? Did he use the ODF 1.0 specification as a test case, a document that the OASIS TC might have had the opportunity to give a similar level of attention to? No, he did not, although that would have validated perfectly, as I’ve demonstrated above. Instead, Alex uses the OOXML specification, a document which by his own testing is not valid OOXML, then converts it into the proprietary .DOC binary format, then translates that binary format into ODF and then tries to validate the results with the ODF 1.0 schema (i.e., the wrong version of the ODF schema since OpenOffice 2.4.0’s output is clearly declared as ODF 1.1), and then applies a non-applicable, non-standard DTD Compatibility constraint test during the Relax NG validation.

Does anyone see something else wrong with this testing methodology?

Aside from the obvious bias of using an input document that Microsoft has spent 18 months perfecting, and using the wrong schemas and validator settings, there is another, more subtle problem.

Alex’s test of OOXML and ODF are testing entirely different things. With OOXML, he took a version N (Ecma-376) OOXML document and tried to validate it with a version N+1 (ISO/IEC 29500) version of the OOXML schema.

But what he did with ODF was take a version N+1 (ODF 1.1) document and tried to validate it with an version N (ODF 1.0) of the ODF schema.

These are entirely different operations. One test is testing the backwards compatibility of the schema, the other is testing the backwards compatibility of document instances. It takes no genius to figure out that if ODF 1.1 adds new elements, then an ODF 1.1 document instance will not validate with the ODF 1.0 schema. We don’t ordinarily expect backwardly compatible validity of document instances. Again, Alex’s tests are biased in OOXML’s favor, giving ODF a much more difficult, even impossible task, compared the the versions ran for OOXML.

If we want to compare apples to apples, it is quite easy to perform the equivalent test with ODF. I gave it a try, taking a version N document (the ODF 1.0 standard itself, per above) and validated it with the version N+1 schema (ODF 1.1 in this case). It worked perfectly. No warnings, no errors.

In any case, in his backwards test Alex reports 7,525 errors, “mostly of the same type (use of an undeclared soft-page-break element)” when validating the OOXML text with ODF 1.0 schema. Indeed, all but 39 of these errors are reports of soft-page-break.

Soft page breaks are a new feature introduced in ODF 1.1. It has two primary advantages for accessibility. First it allows easier collaboration between people using different technologies to read a document. Not all documents are deeply structured, with formal divisions like section 3.2.1, etc. Most business documents are loosely structured, and collaboration occurs by referring to “2nd paragraph on page 23” or “the bottom of page 18”. But when using different assistive technologies, from larger fonts, to braille, to audio renderings, the page breaks (if the assistive technology even has the concept of a page break) are usually located differently from the page breaks in the original authoring tool. This makes collaboration difficult. So, ODF 1.1 added the ability for applications to write out “soft” page breaks, indicating where the page breaks occurred when the original source document was saved.

Although this feature was added for accessibility reasons, like curb cuts, its likely future applications are more general. We will all benefit. For example, a convertor for translating from ODF to HTML would ordinarily only be able to calculate the original page breaks by undertaking complex layout calculations. But with soft page breaks recorded, even a simple XSLT script can use this information to insert indications of page breaks, or to generate accurate page numbering, etc. Although the addition of this feature hinders Alex’s idiosyncratic attempt to validate ODF 1.1 documents with the ODF 1.0 schema, I think the fact that this feature helps blind and visually impaired users, and generally improves collaboration makes it a fair trade-off.

Wouldn’t you agree?

That leaves 39 validation errors in Alex’s test. 12 of them are reports of invalid values in an xlink:href attribute value. This appears to be an error in the original DOCX file. Garbage In, Garbage Out. For example, in one case the original document has HYPERLINK field that contains a link to content in Microsoft’s proprietary CHM format (Compiled HTML). The link provided in the original document does not match the syntax rules required for an XML Schema anyURI (the URL ends with “##” rather than “#”) Maybe it is correct for markup like this, with non-standard, non-interoperable URI’s, to give validation errors. This is not the first time that OOXML has been found polluting XML with proprietary extensions. But realize that OpenOffice 2.4.0 did not create this error. OpenOffice is just passing the error along, as Office 2007 saved it. It is interesting to note that this error was not caught in MS Office, and indeed is undetectable with OOXML’s lax schema. But the error was caught with the ODF schema. This is a good thing, yes? It might be a good idea for OpenOffice to add an optional validation step after importing Microsoft Office documents, to filter out such data pollution.

For the remaining validation errors, they are 27 instances of style:with-tab. Honestly, I have no explanation for this. This attribute does not exist in ODF 1.0 or ODF 1.1. That it is written out appears to be a bug in OpenOffice. Maybe someone there can tell us why the story is on this? But I don’t see this problem in all documents, or even most documents.

For fun I tried processing this OOXML document another way. Instead of the multi-hop OOXML-to-DOC-to-ODF conversion Alex did, why not go directly from OOXML to ODF in one step, using the convertor that Microsoft/CleverAge created? This should be much cleaner, since it doesn’t have all the legacy code or messiness of the binary formats or legacy application code. It is just a mapping from one markup to another markup, written from scratch. Getting the output to be valid should be trivial.

So I download the “OpenXML/ODF Translator Command Line Tools” from SourceForge. According to their web page, this tool targets ODF 1.0, so we’ll be validating against the ODF 1.0 schemas.

This tool is very easy to use once you have the .NET prerequisites installed. The command line was:

odfconvertor /I "Office Open XML Part 4 - Markup Language Reference.docx"

The convertor then chugs along for a long, long, long time. I mean a long time. The conversion from OOXML to ODF eventually finished, after 11 hours, 10 minutes and 41 seconds! And this was on a Thinkpad T60p with dual-core Intel 2.16Ghz processor and 2.0 GB of RAM.

I then rang jing, using the validation command lines from above. It reported 376 validation errors, which fell into several categories:

text:s element not allowed in this context
bad value for text:style:name
bad value for text:outline-level
bad value for svg:x
bad value for svg:y
element tetx:tracked-changes not allowed in this context
“text not allowed here”

In any case, not a lot of errors, but a handful of errors repeated. But it is surprising to see that this single-purpose tool, written from scratch, had more validation errors in it than OpenOffice 2.4.0 does.

In the end we should put this in perspective. Can OpenOffice produce valid ODF documents? Yes, it can, and I have given an example. Can OpenOffice produce invalid documents? Yes, of course. For example when it writes out a .DOC binary file, it is not even well-formed XML. And we’ve seen one example, where via a conversion from OOXML, it wrote out an ODF 1.1 document that failed validation. But conformance for an application does not require that it is incapable of writing out an invalid document. Conformance requires that it is capable of writing out a valid document. And of course, success for an ODF implementation requires that its conformance to the standard is sufficient to deliver on the promises of the standard, for interoperability.

It is interesting to recall the study that Dagfinn Parnas did a few years ago. He analyzed 2.5 million web pages. He found that only 0.7% of them were valid markup. Depending on how you write the headlines, this is either an alarming statement on the low formal quality of web content, or a reassuring thought on the robustness of well-designed applications and systems. Certainly the web seems to have thrived in spite of the fact that almost every web page is in error according to the appropriate web standards. In fact I promise you that the page you are reading now is not valid, and neither is Alex Brown’s, nor SC34’s, nor JTC1’s, nor Ecma’s, nor ISO’s, nor the IEC’s.

So I suggest that ODF has a far better validation record than HTML and the web have, and that is an encouraging statement. In any case, Alex Brown’s dire pronouncements on ODF validity have been weighed in the balance and found wanting.

4 May 2008

Alex has responded on his blog with “ODF validation for cognoscneti“. He deals purely with the ID/IDREF/IDREFS questions in XML. He does not justify his biased and faulty testing methodology, not does he reiterate his bold claims that there are no valid ODF 1.0 documents in existence.

Since Alex’s blog does not seem to be allowing me to comment, I’ll put here what I would have put there. I’ll be brief because I have other fish to fry today.

Alex, no one doubts that ID/IDREF/IDREFS constraints must be respected by valid ODF document instances. I never suggested otherwise. But what I do state is that this is not a concern of a Relax NG validator. You can read James Clark saying the same thing in his 2001 “Guidelines for using W3C XML Schema Datatypes with RELAX NG“, which says in part:

The semantics defined by [W3C XML Schema Datatypes] for the ID, IDREF and IDREFS datatypes are purely lexical and do not include the cross-reference semantics of the corresponding [XML 1.0] datatypes. The cross-reference semantics of these datatypes in XML Schema comes from XML Schema Part 1. Furthermore, the [XML 1.0] cross-reference semantics of these datatypes do not fit into the RELAX NG model of what a datatype is. Therefore, RELAX NG validation will only validate the lexical aspects of these datatypes as defined in [W3C XML Schema Datatypes].

Validation of ID/IDREF/IDREFS cross-reference semantics is not the job of Relax NG, and you are incorrect to suggest otherwise. Your logic is also deficient when you take my statement of that fact and derive the false statement that I believe that ID/IDREF semantics do not apply to ODF. One does not follow from the other.

You know, as much as anyone, that conformance is a complex topic. One does not ordinarily expect, except in trivial XML formats, that the complete set of conformance constraints will be expressed in the schema. Typically a multi-layered approach is used, with some syntax and structural constraints expressed in XML Schema or Relax NG, some business constraints in Schematron, and maybe even some deeper semantic constraints that are expressed only in the text of the standard and can only be tested by application logic.

For example, a document that defines a cryptographic algorithm might need to store a prime number. The schema might define this as an integer. The fact that the schema does not state or guarantee that it is a prime number is not the fault of the schema. And the inability of a Relax NG validator to test primality is not a defect in Relax NG. The primality test would simply need to be carried out at another level, with application logic. But the requirement for primality in document instances can still be a conformance requirement and it is still testable, albeit with some computational effort, in application logic.

I believe that is the source of your confusion. The initial errors you saw when running jing with the Relax NG DTD Compatibility flag enabled were not errors in the ODF document instances. What you saw was jing reporting that it could not apply the Relax NG DTD Compatibility ID/IDREF/IDREFS constraint checks using the ODF 1.0 schema. That in no way means that the constraints defined in XML 1.0 are not required on ODF document instances. It simply indicates that you would need to verify these constraints using means other than Relax NG DTD Compatibility.

So I wonder, have you actually found ODF document instances, say written from OpenOffice 2.4.0, which have ID/IDREF/IDREFS usage which violates the constraints expressed in ODF 1.0?

Finally, in your professional judgment, do you maintain that this is a accurate statement: “For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF.”

Embrace the Reality and Logic of Choice

2008/04/30 By Rob 9 Comments

Another neo-colonialist press release from Microsoft’s CompTIA lobbying arm, this time inveighing against South Africa’s adoption of ODF as a national standard. One way to point out the absurdity of their logic is to replace the reference to ODF with references to any other useful standard that a government might adopt, like electrical standards.

When we do this, we end up with the following.

South Africa Electrical Current Adoption Outdated

South Africa’s recent adoption of the 230V/50Hz residential electrical standard represents a tact that will blunt innovation, much needed for their developing economy. The policy choice – which actually reduces electrical current choice – runs contrary to worldwide policy trends, where multiple electrical standards rule, thus threatening to separate South Africa from the wealth creating abilities of the global electrical industry.

For MonPrevAss, the Monopoly Preservation Association, the overall concern for the global electrical industry is to ensure that lawmakers adopt flexible policies and set policy targets rather than deciding on fixed rules, technologies and different national standards to achieve these targets. Such rigid approaches pull the global electrical market apart rather than getting markets to work together and boost innovation for consumers and taxpayers. “The adoption sends a negative signal to a highly innovative sector” says I.M. Atool, MonPrevAss’s Group Director, Public Policy EMEA.

The “South African Bureau of Standards” (SABS) approved the 230V/50Hz residential electrical standard on Friday 18 April as an official national standard. This adoption, if implemented, will reduce choice, decrease the benefits of open competition and thwart innovation. The irony here is that South Africa is moving in a direction which stands in stark relief to the reality of the highly dynamic market, with some 40 different electrical current conventions available today.

“Multiple co-existing electrical standards as opposed to only one standard should be favoured in the interest of users. The markets are the most efficient in creating electrical standards and it should stay within the exclusive hands of the market”, I.M. Atool explains.

In light of the recent ISO/IEC adoption of the Microsoft 240V/55Hz electrical standard, the South African decision will not lead to improvements in the electrical sector. MonPrevAss urges Governments to allow consumers and users to decide which electrical standards are best. We fear that the choice of just one electrical standard runs the risk of being outdated before it is even implemented, as well as being prohibitively costly to public budgets and taxpayers.

Governments should not restrict themselves to working with one electrical standard, and should urge legislators to refrain from any kind of mandatory regulation and discriminatory interventions in the market. The global electrical industry recommends Governments to embrace the reality and logic of choice and to devote their energies to ensuring interoperability through this choice.

Of course, this is just a rehash of an old logical fallacy, related to the old “Broken Windows” fallacy. It is like saying heart disease is a good thing because you have such a wide choice of therapies to treat it. We would all agree that it is far preferable to be healthy and have a wide choice of activities that you want to do, rather than a wide choice of solutions to a problem that you never asked for and don’t want.

Consumers don’t want a bag of adapters to convert between different formats and protocols. That is giving consumers a choice in a solution to a interoperability problem they didn’t ask for and they don’t want. Consumers want a choice of goods and services.

Observe the recent standards war with Blu-ray and HD DVD. Ask yourself:

Did consumers want a choice in formats, or did they want a wider choice in players and high definition movies?
Did movie studios want a choice in formats and either the uncertainty over choosing the winner, or the expense of supporting both formats? Or did they really just want a single format that would allow them to reach all consumers?
Did the uncertainty around the existence of two competing high definition formats help or hurt the adoption of high definition technologies in general?
Did consumers who make the early choice to go with HD DVD, say Microsoft XBox owners, benefit from having this choice?

If every private individual, and every private business has the right to adopt technology standards according to their needs, why should governments be denied that same right? Why should they be forced to take the only certain losing side of every standards war — implementing all standards indiscriminately — a choice that no rational business owner would make?

How many spreadsheet formats does Microsoft use internally for running their business on? Why should governments be denied choice in the same field that Microsoft itself exerts its right to chose?

Sinclair’s Syndrome

2008/04/17 By Rob 10 Comments

A curious FAQ put up by an unnamed ISO staffer on MS-OOXML. Question #1 expresses concerns about Fast Tracking a 6,000 page specification, a concern which a large number of NB’s also expressed during the DIS process. Rather than deal honestly with this question, the ISO FAQ says:

The number of pages of a document is not a criterion cited in the JTC 1 Directives for refusal. It should be noted that it is not unusual for IT standards to run to several hundred, or even several thousand pages.

Now certainly there are standards that are several pages long. For example, Microsoft likes to bring up the example of ISO 14496, MPEG 4, at over 4,000 pages in length. But that wasn’t a Fast Track. And as Arnaud Lehors reminded us earlier, MPEG 4 was standardized in 17 parts over 6 years.

So any answer in the FAQ which attempts to consider what is usual and what is unusual must take account of past practice JTC1 Fast Track submissions. That, after all, was the question the FAQ purports to address.

Ecma claims (PowerPoint presentation here) that there have been around 300 Fast Tracked standards since 1987 and Ecma has done around 80% of them. So looking at Ecma Fast Tracks is a reasonable sample. Luckily Ecma has posted all of their standards, from 1991 at least, in a nice table that allows us to examine this question more closely. Since we’re only concerned with JTC1 Fast Tracks, not ISO Fast Tracks or standards that received no approval beyond Ecma, we should look at only those which have ISO/IEC designations. “ISO/IEC” indicates that the standard was approved by JTC1.

So where did things stand on the eve of Microsoft’s submission of OOXML to Ecma?

At that point there had been 187 JTC1 Fast Tracks from Ecma since 1991, with basic descriptive statistics as follows:

mean = 103 pages
median = 82 pages
min = 12 pages
max = 767 pages
standard deviation = 102 pages

A histogram of the page lengths looks like this:

So the ISO statement that “it is not unusual for IT standards to run to several hundred, or even several thousand pages” does not seem to ring true in the case of JTC1 Fast Tracks. A good question to ask anyone who says otherwise is, “In the time since JTC1 was founded, how many JTC1 Fast Tracks have been submitted greater than 1,000 pages in length”. Let me know if you get a straight answer.

Let’s look at one more chart. This shows the length of Ecma Fast Tracks over time, from the 28-page Ecma-6 in 1991 to the 6,045 page Ecma-376 in 2006.

Let’s consider the question of usual and unusual again, the question that ISO is trying to inform the public on. Do you see anything unusual in the above chart? Take a few minutes. It is a little tricky to spot at first, but with some study you will see that one of the standards plotted in the above chart is atypical. Keep looking for it. Focus on the center of the chart, let your eyes relax, clear your mind of extraneous thoughts.

If you don’t see it after 10 minutes or so, don’t feel bad. Some people and even whole companies are not capable of seeing this anomaly. As best as I can tell it is a novel cognitive disorder caused by taking money from Microsoft. I call it “Sinclair’s Syndrome” after Upton Sinclair who gave an early description of the condition, writing in 1935: “It is difficult to get a man to understand something when his salary depends upon his not understanding it.”

To put it in more approachable terms, observe that Ecma-376, OOXML, at 6,045 pages in length, was 58 standard deviations above the mean for Ecma Fast Tracks. Consider also that the average adult American male is 5′ 9″ (175 cm) tall, with a standard deviation of 3″ (8 cm). For a man to be as tall, relative to the average height, as OOXML is to the average Fast Track, he would need to be 20′ 3″ (6.2 m) tall !

For ISO, in a public relations pitch, to blithely suggest that several thousand page Fast Tracks are “not unusual” shows an audacious disregard for the truth and a lack of respect for a public that is looking for ISO to correct its errors, not blow smoke at them in a revisionist attempt to portray the DIS 29500 approval process as normal, acceptable or even legitimate. We should expect better from ISO and we should express disappointment in them when they let us down in our reasonable expectations of honesty. We don’t expect this from Ecma. We don’t expect this from Microsoft. But we should expect this from ISO.

Suggesting ODF Enhancements

2008/04/16 By Rob 10 Comments

There is a good post by Mathias Bauer on Sun Hamburg’s GullFOSS blog. He deals with the practical importance of OASIS’s “Feedback License” that governs any public feedback OASIS receives from non-TC members.

The ODF TC receives ideas for new features from many places. Many of the ideas come from our TC members themselves, where we have representation from most of the major ODF vendors, from open source projects, interest groups, as well as from individual contributors.

Other ideas come from other vendors or open source projects, from organizations that the TC has a liaison relationship with (like ISO/IEC JTC1/SC34), or individual members of the public.

Contributions from OASIS TC members are already covered by the OASIS IPR Policy. The TC member who contributes written proposals to the TC is obliged from the time of contribution. And other TC members are obliged if they have been TC members for at least 60 days and remain a member 7 days after approval of any Committee Draft. You can see the participation status of TC members here.

For everyone else, those who are not members of the ODF TC, the rules require that proposals, feedback, comments, ideas, etc., come through our comment mailing list. But before you can post to the comment list you must first accept the terms of the Feedback License.

Is this extra step annoying? Yes, it is. But this pain is what is necessary to keep our IP pedigree clean and protect the rights of everyone to implement and use ODF. It is part of the price we pay for open standards. Free does not mean free from vigilance.

One of my responsibilities on the ODF TC is to monitor and process the public comments we receive. Regretfully this is a duty which I’ve neglected for too long. So I spent some time this week getting caught up on the comments, entering them all into a tracking spreadsheet. We have a total of 180 public comments since ODF 1.0 was approved by OASIS, covering everything from new feature proposals to reports of typographical errors.

The largest single source of comments is from the Japanese JTC1/SC34 mirror committee, where they have been translating the ODF 1.0 standard into Japanese. As you know, you will get no closer reading of a text than when attempting translation, so we’re glad to receive this scrutiny. I’ll look forward to adding the Japanese translation of ODF along side the existing Russian and Chinese translations soon.

For comments that are in the nature of a defect report, i.e., reporting an editorial or technical error in the standard, we will include a fix in the ODF 1.0 errata document we are preparing. For comments that are in the nature of a new feature proposal, we will discuss on a TC call, and decide whether or not to include it in ODF 1.2.

A sample of some of the feature proposals from the comment list are:

A request to support embedded fonts in ODF documents
A request to support multiple versions of the same document in the same file
A request to allow vertical text justification
A proposal for enhanced string processing spreadsheet functions
A proposal for spreadsheet values to allow units, which would help prevent calculation errors due to mixing units, i.e., adding mm to kg would be flagged as an error.
A proposal for allowing spreadsheet named ranges to have namespaces, with each sheet in a workbook having its own namespace.
A proposal to allow a document to have a “portable” flag to allow it to self-identify that it contains only portable ODF content with no proprietary extensions.
Proposal for adding FFT support to spreadsheet
Proposal for adding overline text attribute

If you have any other ideas for ODF enhancements, or thoughts on the above proposals, please don’t post a response to this blog! Remember, you need to use the comment list for your feedback to be considered by the OASIS ODF TC.

Of course, general comments are always welcome on this blog.