• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for ODF

ODF

ODF Validation for Dummies

2008/05/02 By Rob 32 Comments

[Updated 4 May 2008, with additional rebuttal at the end]

Alex Brown has a problem. He can’t figure out how to validate ODF documents. Unfortunately, when he couldn’t figure it out, he didn’t ask the OASIS ODF TC for help, which would have been the normal thing to do. Indeed, the ODF TC passed a resolution back in February 2007 that said, in part:

That the ODF TC welcomes any questions from ISO/IEC JTC1/SC34 and
member NB’s regarding OpenDocument Format, the functionality it
describes, the planned evolution of this standard, and its relationship
to other work on the technical agenda of JTC1/SC34. Questions and
comments can be directed to the TC chair and secretary whose email
addresses are given at

http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office

or through the comments facility at

http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office

So it is rather uncollegial of Alex to refuse such an open, transparent way of getting his questions answered. But Alex didn’t avail himself of that avenue. He just assumed if he couldn’t figure out how to validate ODF then it simply couldn’t be done, and that ODF was to blame. This is presumptuous. Does he think that in the three years since ODF 1.0 became a standard, that no one has tried to validate a document?

Alex is so sure of himself that he publicly exults on the claimed significance of his findings:

  • For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF.
  • Even if the schema is fixed, we can see that OpenOffice.org 2.4.0 does not produce valid XML documents. This is to be expected and is a mirror-case of what was found for MS Office 2007: while MS Office has not caught up with the ISO standard, OpenOffice has rather bypassed it (it aims at its consortium standard, just as MS Office does).

I think you agree that these are bold pronouncements, especially coming from someone so prominent in SC34, the Convenor of the ill-fated OOXML BRM, someone who is currently arguing that SC34 should own the maintenance of OOXML and ODF, indeed someone who would be well served if he could show that all consortia standards are junk, and that only SC34 (and he himself) could make them good.

Of course, I’ve been known to pontificate as well. There is nothing necessarily wrong with that. The difference here is that Alex Brown is totally wrong.

But let’s see if we can help show Alex, or anyone else similarly confused, the correct way to validate an ODF document.

First start with an ODF document. When Alex tested OOXML, he used the Ecma-376 OOXML specification. Let’s do the analogous test and validate the ODF 1.0 text. You can download it from the OASIS ODF web site. You’ll want this version of the text, ODF 1.0 (second edition), which is the source document for the ISO version of ODF.

You’ll also want to download the Relax NG schema files for OASIS ODF 1.0, which you can download in two pieces: the main schema, and the manifest schema.

Next you’ll need to get a Relax NG validator. Alex recommends James Clark’s jing, so we’ll use that. I downloaded jing-20030619.zip the main distribution for use with the Java Runtime Environment. Unzip that to a directory and we’re almost there.

Since jing operates on XML files and knows nothing about the Zip package structure of an ODF file, you’ll need to extract the XML contents of the ODF file. There are many ways to do this. My preference, on Windows, is to associate WinZip with the ODF file extensions (ODT, ODS and ODP) so I can right-click on these files unzip them. When you unzip you will have the following XML files, along with directories for images files and other non-XML resources you can ignore:

  • content.xml
  • styles.xml
  • meta.xml
  • settings.xml
  • META-INF/manifest.xml

So now we’re ready to validate! Let’s start with content.xml. The command line for me was:

java -jar c:/jing/bin/jing.jar OpenDocument-schema-v1.0-os.rng content.xml

(Your command may vary, depending on where you put jing, the ODF schema files and the unzipped ODF files)

The result is a whole slew of error messages:

C:\temp\odf\OpenDocument-schema-v1.0-os.rng:17658:18: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"
C:\temp\odf\OpenDocument-schema-v1.0-os.rng:10294:22: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"

Oh no! Emergency, emergency, everyone to get from street!

I wonder if this is one of the things that tripped Alex up? Take a deep breath. These in fact are not Relax NG (ISO/IEC 19757-2) errors at all, but errors generated by jing’s default validation of a different set of constraints, defined in the Relax NG DTD Compatibility specification which has the status of a Committee Specification in OASIS. It is not part of ISO/IEC 19757-2.

Relax NG DTD Compatibility provides three extensions to Relax NG: default attribute values, ID/IDREF constaints and a documentation element. The Relax NG DTD Compatibility specification is quite clear in section 2 that “Conformance is defined separately for each feature. A conformant implementation can support any combination of features.” And in fact, ODF 1.0, in section 1.2 does just that: “The schema language used within this specification is Relax-NG (see [RNG]). The attribute default value feature specified in [RNG-Compat] is used to provide attribute default values”.

It is best to simple disable the checking of Relax NG DTD Compatibility constraints by using the documented “-i” flag in jing. If you want to validate ID/IDREF cross-references, then you’ll need to do that in application code, and not using jing in Relax NG DTD Compatibility mode. Note that jing was not complaining about any actual ID/IDREF problem in the ODF document.

So, false alarm. You can walk safely on the streets now.

(That said, if we can make some simple changes to the ODF schemas that will allow it to work better with the default settings of jing, or other popular tools, then I’m certainly in favor of that. Alex’s proposed changes to the schema are reasonable and should be considered.)

So, let’s repeat the validation with the -i flag:

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng content.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng styles.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng meta.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng settings.xml

Zero errors, zero warnings.

java -jar c:/jing/bin/jing.jar -i OpenDocument-manifest-schema-v1.0-os.rng META-INF/manifest.xml

Zero errors, zero warnings.

So, there you have it, an example that shows that there is at least one document in the universe that is valid to the ODF 1.0 schema, disproving Alex’s statement that “there are no XML documents in existence which are valid to ISO ODF.”

The directions are complete and should allow anyone to validate the ODF 1.0 specification, or any other ODF 1.0 document. Now that we have the basics down, let’s work on some more advanced topics.

First, the reader should note that there are two versions of the ODF schema, the original 1.0 from 2005, and the updated 1.1 from 2007. (This is also a third version underway, ODF 1.2, but that needn’t concern us here.)

An application, when it creates an ODF document, indicates which version of the ODF standard it is targeting. You can find this indication if you look at the office:version attribute on the root element of any ODF XML file. The only values I would expect to see in use today would be “1.0” and “1.1”. Eventually we’ll also see “1.2”.

It is important to use the appropriate version of the ODF schema to validate a particular document. Our goal, as we evolve ODF, is that an application that knows only about ODF 1.0 should be able to adapt and “degrade gracefully” when given an ODF 1.1 document, by ignoring the features it does not understand. But an application written to understand ODF 1.1 should be able to fully understand ODF 1.0 documents without any additional accommodation.

Put differently, from the document perspective, a document that conforms to ODF 1.0 should also conform to ODF 1.1. But the reverse direction is not true.

To accomplish this, as we evolve ODF, within the 1.x family of revisions, we try to limit ourselves to changes that widen the schema constraints, by adding new optional elements, or new attribute values, or expanding the range of values permitted. Constraint changes that are logically narrowing, like removing elements, making optional elements mandatory, or reducing the range of allowed values, would break this kind of document compatibility.

Now of course, at some point we may want to make bolder changes to the schema, but this would be in a major release, like a 2.0 version. But within the ODF 1.x family we want this kind of compatibility.

The net of this is, an ODF 1.1 document should only be expected to be valid to the ODF 1.1 schema, but an ODF 1.0 document should be valid to the ODF 1.0 and the ODF 1.1 schemas.

That’s enough theory! Let’s take a look now at the test that Alex actually ran. It is a rather curious, strangely biased kind of test, but the bad thinking is interesting enough to devote some time to examine in some detail.

When he earlier tested OOXML, Alex used the OOXML standard itself, a text on which Microsoft engineers had lavished many person-years of attention for the past 18 months, and he validated it with the current version of the OOXML schema. That is pretty much the best case, testing a document that has never been out of Microsoft’s sight for 18 months and testing it with the current version of the schema. I would expect that this document would have been a regular test case for Microsoft internally, and that its validity has been repeatedly and exhaustively tested over the past 18 months. I know that I personally tested it when Ecma-376 was first released, since it was the only significant OOXML document around. So, essentially Alex gave OOXML the softest of all soft pitches.

I think Microsoft’s response, that the validity errors detected by Alex are due to changes made to the schema at the BRM, is a reasonable and accurate explanation. The real story on OOXML standardization is not how many changes were made that were incompatible with Office 2007, but how few. It appears that very few changes, perhaps only one, will be required to make Office 2007’s output be valid OOXML.

So when testing ODF, what did Alex do? Did he use the ODF 1.0 specification as a test case, a document that the OASIS TC might have had the opportunity to give a similar level of attention to? No, he did not, although that would have validated perfectly, as I’ve demonstrated above. Instead, Alex uses the OOXML specification, a document which by his own testing is not valid OOXML, then converts it into the proprietary .DOC binary format, then translates that binary format into ODF and then tries to validate the results with the ODF 1.0 schema (i.e., the wrong version of the ODF schema since OpenOffice 2.4.0’s output is clearly declared as ODF 1.1), and then applies a non-applicable, non-standard DTD Compatibility constraint test during the Relax NG validation.

Does anyone see something else wrong with this testing methodology?

Aside from the obvious bias of using an input document that Microsoft has spent 18 months perfecting, and using the wrong schemas and validator settings, there is another, more subtle problem.

Alex’s test of OOXML and ODF are testing entirely different things. With OOXML, he took a version N (Ecma-376) OOXML document and tried to validate it with a version N+1 (ISO/IEC 29500) version of the OOXML schema.

But what he did with ODF was take a version N+1 (ODF 1.1) document and tried to validate it with an version N (ODF 1.0) of the ODF schema.

These are entirely different operations. One test is testing the backwards compatibility of the schema, the other is testing the backwards compatibility of document instances. It takes no genius to figure out that if ODF 1.1 adds new elements, then an ODF 1.1 document instance will not validate with the ODF 1.0 schema. We don’t ordinarily expect backwardly compatible validity of document instances. Again, Alex’s tests are biased in OOXML’s favor, giving ODF a much more difficult, even impossible task, compared the the versions ran for OOXML.

If we want to compare apples to apples, it is quite easy to perform the equivalent test with ODF. I gave it a try, taking a version N document (the ODF 1.0 standard itself, per above) and validated it with the version N+1 schema (ODF 1.1 in this case). It worked perfectly. No warnings, no errors.

In any case, in his backwards test Alex reports 7,525 errors, “mostly of the same type (use of an undeclared soft-page-break element)” when validating the OOXML text with ODF 1.0 schema. Indeed, all but 39 of these errors are reports of soft-page-break.

Soft page breaks are a new feature introduced in ODF 1.1. It has two primary advantages for accessibility. First it allows easier collaboration between people using different technologies to read a document. Not all documents are deeply structured, with formal divisions like section 3.2.1, etc. Most business documents are loosely structured, and collaboration occurs by referring to “2nd paragraph on page 23” or “the bottom of page 18”. But when using different assistive technologies, from larger fonts, to braille, to audio renderings, the page breaks (if the assistive technology even has the concept of a page break) are usually located differently from the page breaks in the original authoring tool. This makes collaboration difficult. So, ODF 1.1 added the ability for applications to write out “soft” page breaks, indicating where the page breaks occurred when the original source document was saved.

Although this feature was added for accessibility reasons, like curb cuts, its likely future applications are more general. We will all benefit. For example, a convertor for translating from ODF to HTML would ordinarily only be able to calculate the original page breaks by undertaking complex layout calculations. But with soft page breaks recorded, even a simple XSLT script can use this information to insert indications of page breaks, or to generate accurate page numbering, etc. Although the addition of this feature hinders Alex’s idiosyncratic attempt to validate ODF 1.1 documents with the ODF 1.0 schema, I think the fact that this feature helps blind and visually impaired users, and generally improves collaboration makes it a fair trade-off.

Wouldn’t you agree?

That leaves 39 validation errors in Alex’s test. 12 of them are reports of invalid values in an xlink:href attribute value. This appears to be an error in the original DOCX file. Garbage In, Garbage Out. For example, in one case the original document has HYPERLINK field that contains a link to content in Microsoft’s proprietary CHM format (Compiled HTML). The link provided in the original document does not match the syntax rules required for an XML Schema anyURI (the URL ends with “##” rather than “#”) Maybe it is correct for markup like this, with non-standard, non-interoperable URI’s, to give validation errors. This is not the first time that OOXML has been found polluting XML with proprietary extensions. But realize that OpenOffice 2.4.0 did not create this error. OpenOffice is just passing the error along, as Office 2007 saved it. It is interesting to note that this error was not caught in MS Office, and indeed is undetectable with OOXML’s lax schema. But the error was caught with the ODF schema. This is a good thing, yes? It might be a good idea for OpenOffice to add an optional validation step after importing Microsoft Office documents, to filter out such data pollution.

For the remaining validation errors, they are 27 instances of style:with-tab. Honestly, I have no explanation for this. This attribute does not exist in ODF 1.0 or ODF 1.1. That it is written out appears to be a bug in OpenOffice. Maybe someone there can tell us why the story is on this? But I don’t see this problem in all documents, or even most documents.

For fun I tried processing this OOXML document another way. Instead of the multi-hop OOXML-to-DOC-to-ODF conversion Alex did, why not go directly from OOXML to ODF in one step, using the convertor that Microsoft/CleverAge created? This should be much cleaner, since it doesn’t have all the legacy code or messiness of the binary formats or legacy application code. It is just a mapping from one markup to another markup, written from scratch. Getting the output to be valid should be trivial.

So I download the “OpenXML/ODF Translator Command Line Tools” from SourceForge. According to their web page, this tool targets ODF 1.0, so we’ll be validating against the ODF 1.0 schemas.

This tool is very easy to use once you have the .NET prerequisites installed. The command line was:

odfconvertor /I "Office Open XML Part 4 - Markup Language Reference.docx"

The convertor then chugs along for a long, long, long time. I mean a long time. The conversion from OOXML to ODF eventually finished, after 11 hours, 10 minutes and 41 seconds! And this was on a Thinkpad T60p with dual-core Intel 2.16Ghz processor and 2.0 GB of RAM.

I then rang jing, using the validation command lines from above. It reported 376 validation errors, which fell into several categories:

  • text:s element not allowed in this context
  • bad value for text:style:name
  • bad value for text:outline-level
  • bad value for svg:x
  • bad value for svg:y
  • element tetx:tracked-changes not allowed in this context
  • “text not allowed here”

In any case, not a lot of errors, but a handful of errors repeated. But it is surprising to see that this single-purpose tool, written from scratch, had more validation errors in it than OpenOffice 2.4.0 does.

In the end we should put this in perspective. Can OpenOffice produce valid ODF documents? Yes, it can, and I have given an example. Can OpenOffice produce invalid documents? Yes, of course. For example when it writes out a .DOC binary file, it is not even well-formed XML. And we’ve seen one example, where via a conversion from OOXML, it wrote out an ODF 1.1 document that failed validation. But conformance for an application does not require that it is incapable of writing out an invalid document. Conformance requires that it is capable of writing out a valid document. And of course, success for an ODF implementation requires that its conformance to the standard is sufficient to deliver on the promises of the standard, for interoperability.

It is interesting to recall the study that Dagfinn Parnas did a few years ago. He analyzed 2.5 million web pages. He found that only 0.7% of them were valid markup. Depending on how you write the headlines, this is either an alarming statement on the low formal quality of web content, or a reassuring thought on the robustness of well-designed applications and systems. Certainly the web seems to have thrived in spite of the fact that almost every web page is in error according to the appropriate web standards. In fact I promise you that the page you are reading now is not valid, and neither is Alex Brown’s, nor SC34’s, nor JTC1’s, nor Ecma’s, nor ISO’s, nor the IEC’s.

So I suggest that ODF has a far better validation record than HTML and the web have, and that is an encouraging statement. In any case, Alex Brown’s dire pronouncements on ODF validity have been weighed in the balance and found wanting.


4 May 2008

Alex has responded on his blog with “ODF validation for cognoscneti“. He deals purely with the ID/IDREF/IDREFS questions in XML. He does not justify his biased and faulty testing methodology, not does he reiterate his bold claims that there are no valid ODF 1.0 documents in existence.

Since Alex’s blog does not seem to be allowing me to comment, I’ll put here what I would have put there. I’ll be brief because I have other fish to fry today.

Alex, no one doubts that ID/IDREF/IDREFS constraints must be respected by valid ODF document instances. I never suggested otherwise. But what I do state is that this is not a concern of a Relax NG validator. You can read James Clark saying the same thing in his 2001 “Guidelines for using W3C XML Schema Datatypes with RELAX NG“, which says in part:

The semantics defined by [W3C XML Schema Datatypes] for the ID, IDREF and IDREFS datatypes are purely lexical and do not include the cross-reference semantics of the corresponding [XML 1.0] datatypes. The cross-reference semantics of these datatypes in XML Schema comes from XML Schema Part 1. Furthermore, the [XML 1.0] cross-reference semantics of these datatypes do not fit into the RELAX NG model of what a datatype is. Therefore, RELAX NG validation will only validate the lexical aspects of these datatypes as defined in [W3C XML Schema Datatypes].

Validation of ID/IDREF/IDREFS cross-reference semantics is not the job of Relax NG, and you are incorrect to suggest otherwise. Your logic is also deficient when you take my statement of that fact and derive the false statement that I believe that ID/IDREF semantics do not apply to ODF. One does not follow from the other.

You know, as much as anyone, that conformance is a complex topic. One does not ordinarily expect, except in trivial XML formats, that the complete set of conformance constraints will be expressed in the schema. Typically a multi-layered approach is used, with some syntax and structural constraints expressed in XML Schema or Relax NG, some business constraints in Schematron, and maybe even some deeper semantic constraints that are expressed only in the text of the standard and can only be tested by application logic.

For example, a document that defines a cryptographic algorithm might need to store a prime number. The schema might define this as an integer. The fact that the schema does not state or guarantee that it is a prime number is not the fault of the schema. And the inability of a Relax NG validator to test primality is not a defect in Relax NG. The primality test would simply need to be carried out at another level, with application logic. But the requirement for primality in document instances can still be a conformance requirement and it is still testable, albeit with some computational effort, in application logic.

I believe that is the source of your confusion. The initial errors you saw when running jing with the Relax NG DTD Compatibility flag enabled were not errors in the ODF document instances. What you saw was jing reporting that it could not apply the Relax NG DTD Compatibility ID/IDREF/IDREFS constraint checks using the ODF 1.0 schema. That in no way means that the constraints defined in XML 1.0 are not required on ODF document instances. It simply indicates that you would need to verify these constraints using means other than Relax NG DTD Compatibility.

So I wonder, have you actually found ODF document instances, say written from OpenOffice 2.4.0, which have ID/IDREF/IDREFS usage which violates the constraints expressed in ODF 1.0?

Finally, in your professional judgment, do you maintain that this is a accurate statement: “For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF.”

Filed Under: ODF

Suggesting ODF Enhancements

2008/04/16 By Rob 10 Comments

There is a good post by Mathias Bauer on Sun Hamburg’s GullFOSS blog. He deals with the practical importance of OASIS’s “Feedback License” that governs any public feedback OASIS receives from non-TC members.

The ODF TC receives ideas for new features from many places. Many of the ideas come from our TC members themselves, where we have representation from most of the major ODF vendors, from open source projects, interest groups, as well as from individual contributors.

Other ideas come from other vendors or open source projects, from organizations that the TC has a liaison relationship with (like ISO/IEC JTC1/SC34), or individual members of the public.

Contributions from OASIS TC members are already covered by the OASIS IPR Policy. The TC member who contributes written proposals to the TC is obliged from the time of contribution. And other TC members are obliged if they have been TC members for at least 60 days and remain a member 7 days after approval of any Committee Draft. You can see the participation status of TC members here.

For everyone else, those who are not members of the ODF TC, the rules require that proposals, feedback, comments, ideas, etc., come through our comment mailing list. But before you can post to the comment list you must first accept the terms of the Feedback License.

Is this extra step annoying? Yes, it is. But this pain is what is necessary to keep our IP pedigree clean and protect the rights of everyone to implement and use ODF. It is part of the price we pay for open standards. Free does not mean free from vigilance.

One of my responsibilities on the ODF TC is to monitor and process the public comments we receive. Regretfully this is a duty which I’ve neglected for too long. So I spent some time this week getting caught up on the comments, entering them all into a tracking spreadsheet. We have a total of 180 public comments since ODF 1.0 was approved by OASIS, covering everything from new feature proposals to reports of typographical errors.

The largest single source of comments is from the Japanese JTC1/SC34 mirror committee, where they have been translating the ODF 1.0 standard into Japanese. As you know, you will get no closer reading of a text than when attempting translation, so we’re glad to receive this scrutiny. I’ll look forward to adding the Japanese translation of ODF along side the existing Russian and Chinese translations soon.

For comments that are in the nature of a defect report, i.e., reporting an editorial or technical error in the standard, we will include a fix in the ODF 1.0 errata document we are preparing. For comments that are in the nature of a new feature proposal, we will discuss on a TC call, and decide whether or not to include it in ODF 1.2.

A sample of some of the feature proposals from the comment list are:

  • A request to support embedded fonts in ODF documents
  • A request to support multiple versions of the same document in the same file
  • A request to allow vertical text justification
  • A proposal for enhanced string processing spreadsheet functions
  • A proposal for spreadsheet values to allow units, which would help prevent calculation errors due to mixing units, i.e., adding mm to kg would be flagged as an error.
  • A proposal for allowing spreadsheet named ranges to have namespaces, with each sheet in a workbook having its own namespace.
  • A proposal to allow a document to have a “portable” flag to allow it to self-identify that it contains only portable ODF content with no proprietary extensions.
  • Proposal for adding FFT support to spreadsheet
  • Proposal for adding overline text attribute

If you have any other ideas for ODF enhancements, or thoughts on the above proposals, please don’t post a response to this blog! Remember, you need to use the comment list for your feedback to be considered by the OASIS ODF TC.

Of course, general comments are always welcome on this blog.

Filed Under: ODF

Fast Track versus PAS

2008/02/16 By Rob 14 Comments

Years ago I read an interesting article about the encyclopedia entry for the keyword “Longitude”. According to the article, the entry merely said “See Latitude”. With that short, two-word sentence the encyclopedia author conflated these two concepts as mere orthogonal dimensions, lumped together, each as boring as the other. This ignored the fact that latitude is boring, easy, trivial, known to the ancients and as easy to calculate as measuring the altitude of Polaris. But longitude, there lies an epic adventure, something fiendishly difficult to calculate accurately, something that propelled a great seafaring nation to a search for accurate timepieces that would work at sea, just in order to more accurately calculate longitude. Books have been written about longitude, lives lost, fortunes made. But latitude — latitude is for children.

So when I hear people lump Fast Track and PAS process in JTC1 together, I roll my eyes and think… If only they knew how different they really are.

Let’s give it a try, starting with PAS.

PAS stands for “Publicly Available Specification” and the PAS process in JTC1 allows an existing standard from outside of JTC1 to be submitted, reviewed and approved in an accelerated review cycle. An organization that wishes to make a PAS submission (typically a standards consortium) must first seek recognition as a PAS Submitter. This requires that they submit to JTC1 for approval a list of standards they wish to submit, as well as documentation that explains their organizational qualifications. The long list of organizational acceptance criteria are outlined in JTC1 Directives, Annex M:

M7.3 Organisation Acceptance Criteria

M7.3.1 Co-operative Stance (M)

There should be evidence of a co-operative attitude toward open dialogue, and a stated objective of pursuing standardisation in the JTC 1 arena. The JTC 1 community will reciprocate in similar ways, and in addition, will recognise the organisation’s contribution to international standards.

It is JTC 1’s intention to avoid any divergence between the JTC 1 revision of a transposed PAS and a version published by the originator. Therefore, JTC 1 invites the submitter to work closely with JTC 1 in revising or amending a transposed PAS.

There should be acceptable proposals covering the following categories and topics.

M.7.3.1.1 Commitment to Working Agreement(s)

  1. What working agreements have been provided, how comprehensive are they?
  2. How manageable are the proposed working agreements (e.g. understandable, simple, direct, devoid of legalistic language except where necessary)?
  3. What is the attitude toward creating and using working agreements?

M.7.3.1.2 Ongoing Maintenance

  1. What is the willingness and resource availability to conduct ongoing maintenance, interpretation, and 5 year revision cycles following JTC 1 approval (see also M6.1.5)?
  2. What level of willingness and resources are available to facilitate specification progression during the transposition process (e.g. technical clarification and normal document editing)?

M.7.3.1.3 Changes during transposition

  1. What are the expectations of the proposer toward technical and editorial changes to the specification during the transposition process?
  2. How flexible is the proposing organisation toward using only portions of the proposed specification or adding supplemental material to it?

M.7.3.1.4 Future Plans

  1. What are the intentions of the proposing organisation toward future additions, extensions, deletions or modifications to the specification? Under what conditions? When? Rationale?
  2. What willingness exists to work with JTC 1 on future versions in order to avoid divergence? Note that the answer to this question is particularly relevant in cases where doubts may exist about the openness of the submitter organisation.
  3. What is the scope of the organisation activities relative to specifications similar to but beyond that being proposed?

M7.3.2 Characteristics of the Organisation (M)

The PAS should have originated in a stable body that uses reasonable processes for achieving broad consensus among many parties. The PAS owner should demonstrate the openness and non-discrimination of the process which is used to establish consensus, and it should declare any ongoing commercial interest in the specification either as an organisation in its own right or by supporting organisations such as revenue from sales or royalties.

M.7.3.2.1 Process and Consensus:

  1. What processes and procedures are used to achieve consensus, by small groups and by the organisation in its entirety?
  2. How easy or difficult is it for interested parties, e.g. business entities, individuals, or government representatives to participate?
  3. What criteria are used to determine “voting” rights in the process of achieving consensus?

M.7.3.2.2 Credibility and Longevity:

  1. What is the extent of and support from (technical commitment) active members of the organisation? b) How well is the organisation recognised by the interested/affected industry?
  2. How long has the organisation been functional (beyond the initial establishment period) and what are the future expectations for continued existence?
  3. What sort of legal business entity is the organisation operating under?

M7.3.3 Intellectual Property Rights: (M)

The organisation is requested to make known its position on the items listed below. In particular, there shall be a written statement of willingness of the organisation and its members, if applicable, to comply with the ISO/IEC patent policy in reference to the PAS under consideration.

Note: Each JTC 1 National Body should investigate and report the legal implications of this section.

M.7.3.3.1 Patents:

  1. How willing are the organisation and its members to meet the ISO/IEC policy on these matters?
  2. What patent rights, covering any item of the proposal, is the PAS owner aware of?

M.7.3.3.2 Copyrights:

  1. What copyrights have been granted relevant to the subject specification(s)?
  2. What copyrights, including those on implementable code in the specification, is the PAS originator willing to grant?
  3. What conditions, if any, apply (e.g. copyright statements, electronic labels, logos)?

M.7.3.3.3 Distribution Rights:

  1. What distribution rights exist and what are the terms of use?
  2. What degree of flexibility exists relative to modifying distribution rights; before the transposition process is complete, after transposition completion?
  3. Is dual/multiple publication and/or distribution envisaged, and if so, by whom?

M.7.3.3.4 Trademark Rights:

  1. What trademarks apply to the subject specification?
  2. What are the conditions for use and are they to be transferred to ISO/IEC in part or in their entirety?

M.7.3.3.5 Original Contributions:

  1. What original contributions (outside the above IPR categories) (e.g. documents, plans, research papers, tests, proposals) need consideration in terms of ownership and recognition?
  2. What financial considerations are there?
  3. What legal considerations are there?

Once this documentation is provided, a three-month JTC1 ballot is held on the question of whether to approved the applicant as a Recognized PAS Submitter. If approved, this status last for 2 years, but may be renewed by reapplying with updated organizational documentation. Renewals must also be approved by a 3-month letter ballot.

Once an organization has Recognized PAS Submitter status, it may now propose a PAS submission. Such a submission must be within scope of the Submitter’s original application, and must be accompanied by an Explanatory Report that speaks to JTC1’s strategic interests in Interoperability, Cultural and Linguistic Adaptability, as well as the following document-related acceptance criteria:

M7.4 Document Related Criteria

M7.4.1 Quality

Within its scope the specification shall completely describe the functionality (in terms of interfaces, protocols, formats, etc) necessary for an implementation of the PAS. If it is based on a product, it shall include all the functionality necessary to achieve the stated level of compatibility or interoperability in a product independent manner.

M.7.4.1.1 Completeness (M):

  1. How well are all interfaces specified?
  2. How easily can implementation take place without need of additional descriptions?
  3. What proof exists for successful implementations (e.g. availability of test results for media standards)?

M.7.4.1.2 Clarity:

  1. What means are used to provide definitive descriptions beyond straight text?
  2. What tables, figures, and reference materials are used to remove ambiguity?
  3. What contextual material is provided to educate the reader?

M.7.4.1.3 Testability (M)

The extent, use and availability of conformance/interoperability tests or means of implementation verification (e.g. availability of reference material for magnetic media) shall be described, as well as the provisions the specification has for testability.

The specification shall have had sufficient review over an extended time period to characterise it as being stable.

M.7.4.1.4 Stability (M):

  1. How long has the specification existed, unchanged, since some form of verification (e.g. prototype testing, paper analysis, full interoperability tests) has been achieved?
  2. To what extent and for how long have products been implemented using the specification?
  3. What mechanisms are in place to track versions, fixes, and addenda?

M.7.4.1.5 Availability (M):

  1. Where is the specification available (e.g. one source, multinational locations, what types of distributors)?
  2. How long has the specification been available?
  3. Has the distribution been widespread or restricted? (describe situation)
  4. What are the costs associated with specification availability?

M7.4.2 Consensus (M)

The accompanying report shall describe the extent of (inter)national consensus that the document has already achieved.

M.7.4.2.1 Development Consensus:

  1. Describe the process by which the specification was developed.
  2. Describe the process by which the specification was approved.
  3. What “levels” of approval have been obtained?

M.7.4.2.2 Response to User Requirements:

  1. How and when were user requirements considered and utilised?
  2. To what extent have users demonstrated satisfaction?

M.7.4.2.3 Market Acceptance:

  1. How widespread is the market acceptance today? Anticipated?
  2. What evidence is there of market acceptance in the literature?

M.7.4.2.4 Credibility:

  1. What is the extent and use of conformance tests or means of implementation verification?
  2. What provisions does the specification have for testability?

M7.4.3 Alignment

The specification should be aligned with existing JTC 1 standards or ongoing work and thus complement existing standards, architectures and style guides. Any conflicts with existing standards, architectures and style guides should be made clear and justified.

M.7.4.3.1 Relationship to Existing Standards:

  1. What international standards are closely related to the specification and how?
  2. To what international standards is the proposed specification a natural extension?
  3. How is the specification related to emerging and ongoing JTC 1 projects?

M.7.4.3.2 Adaptability and Migration:

  1. What adaptations (migrations) of either the specification or international standards would improve the relationship between the specification and international standards?
  2. How much flexibility do the proponents of the specification have?
  3. What are the longer-range plans for new/evolving specifications?

M.7.4.3.3 Substitution and Replacement:

  1. What needs exist, if any, to replace an existing international standard? Rationale?
  2. What is the need and feasibility of using only a portion of the specification as an international standard?
  3. What portions, if any, of the specification do not belong in an international standard (e.g. too implementation specific)?

M.7.4.3.4 Document Format and Style

  1. What plans, if any, exist to conform to JTC 1 document styles?

The Explanatory Report also sets the maintenance regime for the submission, if approved

The proposed standard, along with the Explanatory Report is then distributed to JTC1 NB’s for a 6-month ballot. Approval criteria is 2/3 approval of voting P-members, and no more than 25% disapproval in total. At the end of the ballot a Ballot Resolution Meeting may be held if needed.

So, that is PAS process, in brief. PAS process is how ODF was approved back in 2006, with OASIS as the Recognized PAS Submitter.

Fast Track process, is almost the same from the time the ballot is issued. The six-month period is split into a 30-day “contradiction period” and a 5-month ballot. (That is an odd difference, with no clear reason). But the voting criteria, the BRM process, etc., this is all the same between the two. What is different (and there are critical differences) is everything that happens before the ballot.

Who can submit a Fast Track? Any JTC1 P-member, or any Class A Liaison can propose a Fast Track.

We all know about P-members. They are NB’s, typically the highest standardization committee in any country. A P-member used to also mean that you had a broad interest in many or most JTC1 matters. But now it may mean merely that Microsoft asked you to join as a P-member.

Class A Liaison are “Organisations which make an effective contribution to and participate actively in the work of JTC 1 or its SCs for most of the questions dealt with by the committee”. Any organization can apply to be a Class A Liaison and be voted in via a letter ballot or at a meeting. There are no formal organization qualifications, no requirement to state an interest in eventually making Fast Tracks, or to answer any of the types of questions that PAS Submitters must answer.

Further, once approved as a Class A Liaison, the status lasts forever. There is no requirement to renew or reapply. In fact JTC1 Directives even lack a documented procedure for removing a Class A Liaison.

So what about the proposals for Fast Track submission. What is required of them? No Explanatory Report is required. No checklist of document-related criteria must be answered. JTC1 Directives say merely “The criteria for proposing an existing standard for the fast-track procedure is a matter for each proposer to decide.” That’s it. It is at the sole discretion of the Class A Liaison.

So you can see what great power Ecma has over JTC1 — they can submit any standard they want for Fast Track, and no one in JTC1 can stop them, or even remove their right to submit more Fast Tracks.

This may explain why Ecma is able to command such high membership fees. A full voting membership in OASIS, which would allow a company to help produce an OASIS Standard for later submission to JTC1 under the arduous PAS process, this costs $1,100 for a small company. To join the US NB and be able to lobby for a Fast Track submission from the US, this will cost you $9,500. But to join Ecma as a voting member (what they call an “Ordinary Member”) this will cost you 70,000 Swiss Francs, or $64,000. That is what no-questions-asked Fast Track service is worth. I think that, from Microsoft’s perspective, the extra $62,900 is money well spent. But what about from JTC1’s perspective? They don’t get this extra money. So what’s their excuse for having these permissive Fast Track procedures that give Ecma so much control?

In any case, that is why I roll my eyes when people lump PAS and Fast Track together, and say that they are essentially the same process. They clearly aren’t. PAS Submitters like OASIS are given intense scrutiny, and are required to document in great detail how their organization and their proposals meet JTC1 criteria. The scrutiny never ends, as a new Explanatory Report is required for every submission, and their status as Recognized PAS Submitter only lasts for a few years before requiring re-approval.

Fast Track submitters, as Class A Liaisons, on the other hand, are the monarchs of JTC1. They serve for life and are answerable to no one. They can submit a Fast Track on any subject they want, at any time. So a standards consortium like Ecma, with primary expertise in optical disk standards, but never having produced an XML standard before, can rubber stamp the world’s largest XML standard and submit it for Fast Track processing to JTC1. And no one can do a thing about it.

Filed Under: ODF, OOXML

Punct Contrapunct

2008/02/12 By Rob 4 Comments

The recent Burton Group report, What’s Up, .DOC? by Guy Creese and Peter O’Kelly was made available free to the public for a stated purpose:

We’ve made the overview available for free (I must admit I’m not sure for how long), as we believe this topic warrants expanded industry debate before a February, 2008 ISO ballot on OOXML, and we want to help catalyze and advance the debate.

The degree of expanded debate achieved may be estimated by noting that Microsoft is sending this report to every JTC1 national body involved in the OOXML ballot, from Pakistan to Ecuador, and has invited Peter O’Kelly to speak on this paper both at the recent OOXML press event in Washington as well as this week’s Office Developers Conference.

Much could be said of this report, but I’ll limit myself to commenting on a single passage:

[S]everal vendors interviewed for this overview indicated that it’s essentially impossible to get ODF proposals approved if they’re not also supported in OpenOffice.org, and further noted that Sun closely controls OpenOffice.org (much as it also holds control over Java).

It should be noted that, before making this statement, the authors neither contacted OASIS nor the OASIS ODF TC in order to check their facts.

The ODF Alliance published a rebuttal of this report, and in particular took umbrage at that passage, saying:

This is demonstrably false, and the use of unnamed “vendors” as sources does not eliminate the need for doing basic fact checking on such claims. Rumors and innuendo do not objective analysis make.

First, on the control aspect, note that ODF 1.0, the standard, is owned and controlled by OASIS, a standards consortium of over 600 member organizations. Sun is just one company among many members. Indeed, for most of the development of ODF, Microsoft was on the Board of Directors of OASIS.

Second, OASIS is a corporation. It is legally bound to its Bylaws. There is no arbitrary control by member corporations.

The ODF TC is co-chaired by an IBM employee and a Sun employee, and is regulated by the OASIS TC Process document, which is publicly readable by all and has clear rules of procedure and appeal.

The ODF TC has three subcommittees. The Accessibility SC is co-chaired by IBM and Sun, while the Formula Subcommittee and the Metadata Subcommittee are each chaired by individual members of OASIS who are not affiliated with any large corporations.

Voting rights in the ODF TC, for accepting or rejecting features, is currently as follows:

  • Sun – 3 voting members
  • IBM – 4 voting members
  • Individuals – 3 voting members

This can easily be verified at the OASIS ODF TC website.

Is sharing the chair position on the TC and on 1 of 3 subcommittees considered “closely controlling”? Is having 30% of the votes considered “closely controlling”?

As for proposals being accepted into ODF, we note that all three major features for ODF 1.2, RDF metadata, OpenFormula, and enhanced accessibility, are new proposals which have not been yet implemented in OpenOffice. Moreover, the ODF TC is currently processing a set of features requested by the KOffice open source project. So the assertion that it is “essentially impossible” to get new features into ODF if they are not already supported by OpenOffice is not true. This error is unfortunate and needs correcting through rigorous fact checking, as do the others, in our opinion.

Oddly enough, this particular error occurs in several places. A search of the report for the word “control” shows it used six times, once in reference to “Chinese communists” and five times in reference to Sun Microsystems. Note, however, that no mention is ever made of the strong direct control Microsoft asserts over OOXML, its having sole chairmanship of the Ecma TC45, and its having secured a committee charter that prevents any changes to OOXML that are not compatible with Microsoft Office.

Again, we’re puzzled by the inaccuracy on one hand and the lack of balance on the other.

Now, back to the Burton Group, where Guy Creese responds on the Burton Group blog:

We were not expecting to be told that Sun had significant sway over the standard, but several people told us that (spread across more than one ODF-oriented vendor), which is why we noted it in the report. As the ODF Alliance notes, IBM and Sun—two of Microsoft’s most powerful productivity application archrivals today (as well as partners to Microsoft in myriad other domains, e.g., Web services-related standards initiatives)—collectively control 70% of the votes in the ODF TC which determines if proposals will be accepted or rejected. This suggests there is ample opportunity for conflicts of interest.

Guy, excuse me, did you say “conflicts of interest”? Please explain. Or maybe when Peter O’Kelly comes back from speaking at Microsoft’s Office Developers Conference he can explain it for us?

In any case, the factual errors in your report with respect to the control of ODF have been clearly demonstrated, but instead of simply admitting and correcting the error, you hide beyond anonymous sources and further impugn OASIS by charging some sort of “conflict of interest”.

To follow your logic further demonstrates the absurdity of it. If you believe that the fact that IBM and Sun “collectively control 70% of the votes in the ODF TC” lends weight to your argument, then what is shown by the equally true mathematical fact that IBM plus independent members also control 70% of the votes? Why is this equally true fact not mentioned? This is the nature of plurality, that there are many different combinations of votes that could make a majority position. Further, note that these groups in practice do not always vote as a bloc. We’ve had votes where the independent members split their vote, and we even had a vote where the IBM members did not all vote alike. So much for your simplistic control theory.

I will not question whether your anonymous sources indeed misled you. For sake of argument, I will accept unquestioningly that you indeed had sources and that they said exactly what you claim they said. However, having sources does not excuse you, as an analyst, from doing basic fact checking. The rules of OASIS and the voting composition of the ODF TC are facts, not opinions, and the correct information was sitting there, on public web sites, for you to check. It is not your fault that you were misled by sources, but it is your fault that you did not verify their claims. To publish controversial statements based on anonymous sources without fact checking, this is not something that represents the Burton Group’s finest work.

The Burton Group has denigrated the work and the members of the OASIS Open Document Format Technical Committee (of which I am Co-Chair) with published statements that have been shown to be false. The Burton Group owes us an apology and an immediate retraction.

Waiting until after February, after the DIS 29500 process concludes, to make corrections is unacceptable. Since your stated purpose in making this report public was to “advance the debate” in the current OOXML ISO process, withholding factual corrections until after that process concludes would imply that you and the Burton Group see no problems with knowingly persisting in influencing an ISO ballot with false information published under the Burton Group name. I don’t believe that is the image that the Burton Group would want to project. So I urge that a correction is in order now.

Filed Under: ODF, OOXML

The Case for Harmonization

2008/01/31 By Rob 25 Comments

Depending on who you ask, document standard harmonization is either impossible or inevitable, anathema or nirvana. Let’s dig a little into this question and see if the two sides are really that far apart.

First note that many JTC1 NB’s raised the issue of harmonization in their DIS 29500 ballot comments last September. Some merely requested harmonization, such as Korea, South Africa, Belgium, Peru, Switzerland, or the Czech Republic, while others in addition outlined ways to achieve harmonization. For example, AFNOR, the French NB stated:

After 5 months of extensive discussions between stakeholders in the field of revisable document formats, AFNOR, in the aim to obtain a single standard for XML office document formats within 3 years, makes the following proposal:

  • Split the current ECMA 376 standard in 2 parts in order to differentiate the essential OOXML core functions necessary for easy implementation from those functionalities that are needed for the exchange of legacy office file formats;
  • Incorporate the technical comments below and those in the attached comment table submitted to the Fast Track;
  • Attribute the status of Technical Specification to both parts;
  • Establish a process of convergence between ODF (already standardized as ISO/IEC 26300) and the above mentioned OOXML core. ISO/IEC shall invite parties involved to commit themselves to initiate simultaneously the revisions of the existing ODF v1.0 and the OOXML core in order to obtain at the end of the revision process a standard as universal as possible.

(Note that a Technical Specification, in ISO process, is for proposals which lack insufficient support for approval as an International Standard, but for which publication is still desired. This may be appropriate for OOXML.)

New Zealand’s proposal was similar:

  • OOXML should be considered by JTC 1 for publication as a Type 2 Technical Report.
  • Seek to harmonize with the existing ODF standard to reduce the cost of interoperability, cost of having two standards, and cost of support/maintenance .

Further, the NB’s of Great Britain, New Zealand, and the United States requested that specific features be added to OOXML in order to improve interoperability with ISO ODF, in total 40 features such as the ability:

  • to have more than 63 columns in a table
  • to have background images in tables
  • to have font weights beyond “normal” and “bold”.

Notably, these were the same features that Microsoft sponsored translator project on SourceForge identified as needed to improve translation with ODF. These are the features that the project noted were lacking in OOXML.

Ecma rejected every single one of these requests. They did not argue that the requested features were unreasonable. They did not argue that the requested feature was not needed. Their argument was that harmonization of the formats was not necessary because there exist tools that will translate between OOXML and ODF. In other words, they rejected these requests merely because they were pro-harmonization, regardless of the underlying merit or need of the feature. Ironically, Microsoft’s conversion tools are restricted in their fidelity because of the lack of these very features.

On the question of harmonization, we are either moving toward it, or we are moving away. There is no time better than the present to harmonize. Waiting will only make matters worse, as we will then need to consider legacy OOXML documents as well as legacy binary and legacy ODF documents. The Ecma response does not move us toward harmonization, but starts down the road toward further divergence, a long and costly divergence.

Tim Bray made the critical observation back in 2005, “The world does not need two ways to say ‘This paragraph is in 12-point Arial with 1.2em leading and ragged-right justification’.”

Microsoft likes to claim that harmonization is impossible, that slapping together the features of both standards would lead to a messy, impenetrable mess. Of course, but only an idiot would suggest that as an approach to harmonization. So why do they always bring that up as their strawman?

A look at OpenOffice and Microsoft Office shows a huge degree of functional overlap. Harmonization starts from looking at this functional overlap – and there is a significant, perhaps 90%+ area where they do overlap – and expresses the functional overlap identically, using the same xml schema. In other words, harmonization identifies the commonalities at the functional level and finds a common representation for that commonality.

It would also be expected that the common functionality between ODF and OOXML would also include a common extensibility mechanism, a way for a vendor to express application-specific features that are outside of the harmonized standard.

The remaining 10% of the functionality would be the focus of the harmonization work, the area that requires the most attention. Some portion of that 10% will represent general-purpose features that we can imagine multiple application supporting. We take those features and add them to ODF. That remaining portion of the 10%, which only serves one vendor’s needs, such as flags for deprecated legacy formatting options, could be represented using the common extensibility mechanism.

Does this sound impossible? That’s not what Microsoft says. Gray Knowlton, Group Product Manager for Microsoft Office, was candid to PC World a couple of weeks ago:

Also, if individual governments mandate the use of ODF instead of Open XML, Microsoft would adapt, Knowlton said. The company would then implement the missing functionality that ODF doesn’t support. However, those extensions would be custom-designed and outside of the standard, which is counter to the idea of an open document standard, Knowlton said.

So we’ve agreed that this approach is technically feasible. We’re also agreed that extending ODF outside of the standards process is not a good idea. So the obvious solution is to extend ODF within the standards process. So, let’s do it! What are we waiting for?

There is no reason why, by a harmonization process, all of the functionality of Microsoft Office cannot be represented on a base of ISO 26300 OpenDocument Format. I personally, as Co-Chair of the OASIS ODF TC, stand ready and willing to sponsor such a harmonization effort in OASIS. So let’s start harmonization now, and avoid further divergence.

My read of NB comments indicates that there is a sizable bloc, perhaps even a decisive bloc, of NB’s who are in favor of harmonization. Lets push on this and articulate a roadmap along the lines of the proposals by France and New Zealand, that accomplishes this.

Filed Under: ODF, OOXML

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 12
  • Page 13
  • Page 14
  • Page 15
  • Page 16
  • Interim pages omitted …
  • Page 25
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies