• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for OOXML

OOXML

Will Microsoft Remove DOC Format Support?

2011/01/11 By Rob 17 Comments

I noticed a curious argument in Jonathan Corbet’s LWN article “Supporting OOXML in LibreOffice” (behind a pay wall).  Why should we support OOXML?

…as has been pointed out in the discussion, Microsoft will, someday, phase out support for its (equally proprietary) DOC format, leaving OOXML as the only real option for document interchange. There appears to be little hope that Microsoft’s ODF support will be sufficient to make ODF a viable alternative. So any office productivity suite which aspires to millions of users, and which does not support OOXML, will find itself scrambling to add that support when DOC is no longer an option. It seems better to maintain (and improve) that support now than to be rushing to merge a substandard implementation in the future.

Really?  The same company that is unable to fix a leap-year calculation bug from 20 years ago because of fears it might break backwards compatibility is going to remove support for their binary formats?  Seriously, is that what people are saying?  This sounds like something Microsoft would say to scare people into migrating.

But don’t listen to my opinions.  Let’s look at the numbers.  I’ve been tracking document counts via Google for almost four years now, looking at the relative distribution of document types, across OOXML, ODF, Legacy Binary, PDF, XPS, etc.  Because the size of the web is growing, one cannot fairly compare the absolute numbers of documents from week to week.  But the distribution of documents over time is something worth noting.

The following chart shows the percentage of documents on the web that are in OOXML format, as a percentage of all MS Office documents.  Note carefully the scale of the chart.  It is peaking at less than 3%.  So 97+% of the Microsoft Office documents on the web today are in the legacy binary formats, even four years after Office 2007 was released.

Of course, for any given organization these numbers may vary.  Some are 100% on the XML formats.  Some are 0% on them.   If you look at just “gov” internet domains, the percentage today is only 0.7%.  If you look at only “edu” domains, the number is 4.5%.  No doubt, within organizations, non-public work documents might have a different distribution.  But clearly the large number of existing legacy binary documents on government web sites alone is sufficient to prove my point.  DOC is not going away.

I call “FUD” on this one.

  • Tweet

Filed Under: FUD, OOXML

The value of restricting choice

2010/07/27 By Rob 8 Comments

The language game

Microsoft’s talking points go something like this (summarized in my words):

If you adopt ODF instead of OOXML then you “restrict choice”.  Why would you want to do that?  You’re in favor of openness and competition, right?  So naturally, you should favor choice.

You can see a hundreds of variations on this theme, in Microsoft press releases, whitepapers,  in press articles and blogged by astroturfers, by searching Google for “ODF restrict choice“.

This argument is quite effective, since it is plausible at first glance, and takes more than 15 seconds to refute.  But the argument in the end fails by taking a very superficial view of “choice”, relying merely on the positive allure of its name, essentially using it as a talisman.  But “choice” is more than just a pretty word.  It means something.  And if we dig a little deeper, at what the value of choice really is, the Microsoft argument falls apart.

So let’s make an attempt to show how can one be in favor of choice, but also be in favor of eliminating choice.  Let’s resolve the paradox.  Personally I think this argument is too long, but maybe it will prompt someone to formulate it in a briefer form.

Choice — the option to act

Choice is the option to act on one more possibilities.  Choice is the freedom to take one path or another.  Choice is the ability to open one door or another.  And what is the value of choice?  It depends on the value of the underlying possibilities.

In some cases, the value of choice can be valued quite precisely.

For example, imagine I have three boxes, one containing nothing, one containing $5 and another containing $10.  If you have no choice, and are given one  box at random, then you will get $5 on average.   And if given the choice of which box to pick, also without knowing the contents, you will also get $5 on average.

Similarly, if each box contained exactly $5 and you could see inside, the value of choice would still be zero.

But if the three boxes contained nothing, $5 and $10 and you could see inside, then the value of having a choice is clear.  You would naturally pick the $10 box.  So having a choice is worth an additional $5.

So we see that for choice to have value, you must have two things:

  1. A way to estimate the value of outcome over another.
  2. A preference for one outcome over another

In some cases this can be done with precision.  In other cases it can only be estimated or modeled. For example, trading stock options is essentially the selling and buying of the right to exercise the choice (option) to buy or sell a security at a given price within a given time period.  The value of this choice can be modeled by sophisticated mathematical models like the Black-Scholes option pricing formula.

Eliminating choice

So going back to the  boxes again.  Now imagine one has $10 in it, and the other has a note in it that requires that you pay me $10.  You can see the contents of each box.  Which one do you choose?  It should be obvious, you pick the one with $10 in it.

But what if I say you are not limited to picking only one box.  You can pick either box, or both boxes if you wish.  You have absolute freedom to choose A, B or A+B.  What do you do?  Of course, you still pick the box with $10 in it.

But doesn’t that eliminate choice?  Yes, of course it did.  But the value of choice was only derived from the value of the underlying outcomes.  By choosing, I’ve derived the full value of having a choice.  Since if one choice is clearly more favorable than others (it “dominates” the others) then the alternatives should be discarded.

Resolving the paradox of the choice

Give the choice of A, B or A+B, each are distinct, mutually exclusive choices.  They are the three boxes with three outcomes.  Each one has a value that could be estimated.  When someone portrays option A+B as preserving choice, they are forgetting that this is a choice that also restricts choice, since it eliminates A or B in their exclusive, pure forms from consideration.  Any choice, even the choice of A+B, restricts choice.   If you choose A+B then you have not chosen A alone or B alone.  You have the value of the outcome A+B, but do not have the possibly greater benefits of picking choice A alone or choice B alone.

Clear?  I think this should be obvious, but I’ve seen these concepts cause much confusion.

It is also important to realize that the combination A+B may have conjoint effects, which may be neutral, synergistic or antagonistic.  In other words the value of A+B is not necessarily the same as the value of A plus the value of B.

In some cases, certainly, the value of the A+B choice is the same as the sum of each individual values. For example, the boxes with money and notes, these are all simply additive, with no conjoint effects.

But in other cases, the value of A+B has synergistic effects.   For example, the choice of diet+exercise is more salubrious that either one chosen in isolation.

And in some cases the value of A+B is less than the value either one in isolation, as anyone who has bought both a cat and a dog knows.  These choices are antagonistic.

So back to the file format debate.  The choice here is between adopting ODF, OOXML, or ODF+OOXML.  These three choice are mutually exclusive.  They are the three boxes,  with three different outcomes.  Each outcome has a value that could be estimated.  But we should not fall into the trap of thinking that an ODF+OOXML decision is preserving choice.  Far from it.  By making that choice, one eliminates the possibility of having only ODF, or of having only OOXML, with the resulting values that those choices would bring.  Choosing both formats eliminates outcomes and restricts choice just has much as choosing only ODF eliminates outcomes.

You cannot avoid eliminating the outcomes you do not choose.  There are benefits that would come from having only a single standard, and there are costs and complications from maintaining multiple standards.  These must all be considered.

  • Tweet

Filed Under: ODF, OOXML

ISO/IEC JTC1 Revises Directives, Addresses OOXML Abuses

2010/07/07 By Rob 5 Comments

On July 1st, 2010 a new set of rules (directives) took effect in ISO/IEC JTC1  including new processing and voting rules for JTC1 Fast Track submissions.  If these rules had been in effect back in 2007, OOXML would have died after its initial ballot.

Let’s take a look at some of the specific changes that were made in reaction to the events of 2007-08.

First, we see the elimination of the contradiction phase in Fast Track processing.  If you recall, under previous rules, a Fast Track begin with a 30-day NB review period, sometimes called the “contradiction period”, where NBs were invited to raise objections if they think the Fast Track proposal contradicts an existing ISO or IEC standard.  This was followed by a 5-month ballot.   The problem was that the word “contradiction” was not defined, leading to various irreconcilable interpretations.  In the case of OOXML 20 JTC1 National Bodies (NBs) raised contradictions.  Evidently, the passage of time has lead to no progress on defining what exactly a contradiction is, so the contradiction period has been eliminated entirely.  Instead, looking for “evident contradictions” (still undefined) is given to JTC1 administrative staff, which is the surest way of guaranteeing that we never hear of contradictions again.  The Fast Track DIS ballot remains at 5-months, so net-net this accelerates processing by one month.

Next, we see some clarification around how NBs should vote on Fast Tracks.  Back, during the OOXML ballot,  Microsoft made a huge effort to convince NBs to vote “Yes with comments” if they found serious flaws in the text, with the promise that they would all be addressed at the BRM.  Well, we now know that this was a big lie.  Very few issues were actually discussed and resolved at the BRM.  And most of them were addressed by merely saying,  “Sorry, no change”.  At the time I argued that the rules were quite clear, that disapproval should be voiced by a “No, with comments” vote.  Well, we now see another small slice of vindication.  The revised rules now state:

If a national body finds an enquiry draft [ed.  A Fast Track DIS is an ‘enquiry draft’] unacceptable, it shall vote negatively and state the technical reasons.  It may indicate that the acceptance of specified technical modifications will change its negative vote to one of approval, but it shall not cast an affirmative vote which is conditional on  the acceptance of modifications. (ISO/IEC Directives, Part I, Section 2.7.3)

I assume this is clear enough now.

Another change is that if the DIS ballot fails to get sufficient votes, meaning less than 2/3 approval of ISO/IEC  JTC1 P-members, or more than 25% disapproval overall, the proposal dies at that point.  It doesn’t go on to the BRM.  Game over.  If this rule had been in place back in 2007, OOXML would not be an ISO standard today.

Finally, we see the requirement for a Final DIS (FDIS) text for review and approval by NBs.  Back in 2008 I was quite vocal about the absurdity of having NBs vote on a text that they were not allowed to read.  Several NBs lodged formal objections at the time as well.  All this was dismissed by JTC1 staff.  But reality struck when NBs reads the actual published version of OOXML, and saw that it did not contain all of the changes mandated by the BRM.  So belatedly, but better than never, the rules have been changed.  Fast Tracks now require an FDIS text for NBs to review,  along with a 2-month ballot on it.

There are also smaller, less substantial changes.  For example, the dedication to Jan van den Beld, the former head of Ecma, for his “unwavering dedication to the development and evolution of the JTC 1 procedures”, has been removed.   Ironically, both Ecma and Microsoft have indeed made long-term contributions to the evolution of Fast Track in JTC1, but probably not the way they intended.

The new ISO/IEC Directives are posted online.  Note that one document expresses the common rules for ISO and IEC, while another is a set of supplemental rules which apply to only ISO/IEC JTC1.  Evidently, we’re supposed to consult both documents and mentally merge them whenever trying to determine what the rules are.

  • Tweet

Filed Under: OOXML

Microsoft Office document corruption: Testing the OOXML claims

2010/02/15 By Rob 22 Comments

Summary

In this post I take a look at Microsoft’s claims for robust data recovery with their Office Open XML (OOXML) file format.  I show the results of an experiment, where I introduce random errors into documents and observe whether word processors can recover from these errors.  Based on these result, I estimate data recovery rates for Word 2003 binary, OOXML and ODF documents, as loaded in Word 2007, Word 2003 and in OpenOffice.org Writer 3.2.

My tests suggest that the OOXML format is less robust than the Word binary or ODF formats, with no observed basis for the contrary Microsoft claims.  I then discuss the reasons why this might be expected.

The OOXML “data recovery” claims

I’m sure you’ve heard the claim stated, in one form or another, over the past few years.  The claim is that OOXML files are more robust and recoverable than Office 2003 binary files.  For example, the Ecma Office Open XML File Formats overview says:

Smaller file sizes and improved recovery of corrupted documents enable Microsoft Office users to operate efficiently and confidently and reduces the risk of lost information.

Jean Paoli says essentially the same thing:

By taking advantage of XML, people and organizations will benefit from enhanced data recovery capabilities, greater security, and smaller file size because of the use of ZIP compression.

And we see similar claims in Micrsoft case studies:

The Office Open XML file format can help improve file and data management, data recovery, and interoperability with line-of-business systems by storing important metadata within the document.

A Microsoft press release quotes Senior Vice President Steven Sinofsky:

The new formats improve file and data management, data recovery, and interoperability with line-of-business systems beyond what’s possible with Office 2003 binary files.

Those are just four examples of a claim that has been repeated dozens of time.

There are many kinds of document errors.  Some errors are introduced by logic defects in the authoring application.  Some are introduced by other, non-editor applications that might modify the document after it was authored.  And some are caused failures in data transmission and storage.  The Sinofsky press release gives some further detail into exactly what kinds of errors are more easily recoverable in the OOXML format:

With more and more documents traveling through e-mail attachments or removable storage, the chance of a network or storage failure increases the possibility of a document becoming corrupt. So it’s important that the new file formats also will improve data recovery–and since data is the lifeblood of most businesses, better data recovery has the potential to save companies tremendous amounts of money.

So clearly we’re talking here about network and storage failures, and not application logic errors.  Good, this is a testable proposition then.  We first need to model the effect of these errors on documents.

Modeling document errors

Let’s model “network and storage failures” so we can then test how OOXML files behave when subjected to these types of errors.

With modern error-checking file transfer protocols, the days of transmission data errors are a memory.  Maybe  25 years ago, with XMODEM and other transfer mechanisms, you would see randomly-introduced transmission errors in the body of a document.  But today the more likely problem would be that of truncation, of missing the last few bytes of a file transfer.  This could happen for a variety of reasons, including logic errors in application-hosted file transfer support , to user-induced errors from removing a USB memory stick with uncommitted data still in the file buffer.  (I remember debugging a program once that had a bug where it would lose the last byte of a file whenever the file was an exactt multiple of 1024 bytes.)  These types of error can be particularly pernicious with some file formats.  For example, the old Lotus WordPro file format stored the table of contents for the document container at the end of the file.  This was great for incremental updating, but particularly bad for truncation errors.

For this experiment I modeled truncation errors by generating a series of copies of a reference document, each copy truncating an additional byte from the end of the document.

The other class of errors — “storage errors” as Sinofsky calls them — can come from a variety of hardware-level failures, including degeneration of the physical storage medum or mechanical errors in the storage device.  The unit of physical storage — and thus of physical damage — is the sector.  For most storage media the size of a sector is 512 bytes.  I modeled storage errors by creating a series of copies of a reference document, and for each one selecting a random location within that document and then introducing a 512-byte run of random bytes.

The reference document I used for these tests was Microsoft’s whitepaper, The Microsoft Office Open XML Formats.  This is a 16-page document, with title page with logo, a table of contents, a running text footer, and a text box.

Test Execution

I tested Microsoft Word 2003, Word 2007 and OpenOffice.org 3.2.   I attempted to load each test document into each editor.  Since corrupt documents have the potential to introduce application instability, I exited the editor between each test.

Each test outcome was recorded as one of:

  • Silent Recovery:  The application gave no error or warning message.  The document loaded, with partial localized corruption, but most of the data was recoverable.
  • Prompted Recovery: The application gave an error or warning message offering to recover the data.  The document loaded, with partial localized corruption, but most of the data was recoverable.
  • Recovery Failed: The application gave an error or warning message offering to recover the data, but no data was able to be recovered.
  • Failure to load: The application gave an error message and refused to load the document, or crashed or hanged attempting to load it.

The first two outcomes were scored as successes, and the last two were scored as failures.

Results: Simulated File Truncation

In this series of tests I took each reference document (in DOC, DOCX and ODT formats) and created 32 truncated files corresponding to 1-32 bytes truncation.  The results were the same regardless of the number of bytes truncated, as in the following table:

[table id=3 /]

Results: Simulated Sector Damage:

In these tests I created 30 copies of each reference document and introduced a random 512-byte run of random bytes,  with the following summary results:

[table id=6 /]

Discussion

First, what do the results say about Microsoft’s claim that the OOXML format “improves…data recovery…beyond what’s possible with Office 2003 binary files”?  A look at the above two tables brings this claim into question.  With truncation errors, all three word processors scored 100% recovery using the legacy binary DOC format.  With OOXML the same result was achieved only with Office 2007.  But both Office 2003 and OpenOffice 3.2 failed to open any of the truncated documents.  With the simulated sector-level errors, all three tested applications did far better recovering data from legacy DOC binary files than from OOXML files.  For example, Microsoft Word 2007 recovered 83% of the DOC files but only 47% of the OOXML files.  OpenOffice 3.2 recovered 90% of the DOC files, but only 37% of the OOXML files.

In no case, of almost 200 tested documents, did we see the data recover of OOXML files exceed that of the legacy binary formats.  This makes sense, if you consider this from an information theoretic perspective.  The ZIP compression in OOXML, while it compresses the document at the same time makes the byte stream denser in terms of the information encoding.  The number of physical bits per information bits is smaller in the ZIP than in the uncompressed DOC file. (In the limit of perfect compression, this ratio would be 1-to-1.)  Because of this, a physical error of 1-bit introduces more than 1-bit of error in the information content of the document.  In other words, a compressed document, all else being equal, will be less robust, not more robust to “network and storage failures”.  Because of this it is extraordinary that Microsoft so frequently claims that OOXML is both smaller and more robust than the binary formats, without providing details of how they managed to optimize these two opposing and complementary qualities.

Although no similar claims have been made regarding ODF documents, I tested them as well.  Since ODF documents are compressed by ZIP, we would expect them to also be less robust to physical errors than DOC, for the same reasons discussed above.  This was confirmed in the tests.  However, ODF documents exhibited a higher recovery rate than OOXML.  Both OpenOffice 3.2 (60% versus 37%) as well as Word 2007 (60% versus 47%) had higher recovery rates for ODF documents.  If all else had been equal, we would have expected ODF documents to have lower recover rates than OOXML.  Why?  Because the ODF documents were on average 18% smaller than the corresponding OOXML documents, so the fixed 512-byte sector errors were proportionately larger impact in ODF documents.

The above is explainable if we consider the general problem of random errors in markup.  There are two opposing tendencies here.  On the one hand, the greater the ratio of character data to markup, the more likely it will be that any introduced error will be benign to the integrity of the document, since it will most likely occur within a block of text.  At the extreme, a plain text file, with no markup whatsoever, can handle any degree of error introduction with only proportionate data corruption.  However, one can also argue in the other direction, that the more encoded structure there is in the document, the easier it is to surgically remove only the damaged parts of the file.  However, we must acknowledge that physical errors, the “network and storage failures” that we looked at in these tests, do not respect document structure.  Certainly the results of these tests call into question the wisdom of claiming that the complexity of the document model leads it to be more robust.  When things go wrong, simplicity often wins.

Finally, I should observe that application difference, as well as file format differences, play a role in determining success in recovering damaged files.  With DOC files, OpenOffice.org 3.2 was able to read more files than either version of Microsoft Word.  This confirms some of the anecdotes I’ve heard that OpenOffice will read files that Word will not.  With OOXML files, however, Word 2007 did best, though OpenOffice fared better than Word 2003.  With ODF files, both Word and OpenOffice scored the same.

Further work

Obviously the field of document file robustness is a complex question.  These tests strongly motivate the thought that there are real differences in how robust document formats are with respect to corruption, and these observed differences appear to contradict claims made in Microsoft’s OOXML promotional materials.  It would require more tests to demonstrate the significance and magnitude of those differences.

With more test cases, one could also determine exactly which portions of a file are the most vulnerable.  For example, one could make a heat map visualization to illustrate this.  Are there any particular areas of a document where even a 1-byte error can cause total failures?  It appears that a single-byte truncation error on OOXML documents will cause a total failure in Office 2003, but not in Office 2007.  Are there any 1-byte errors that cause failure in both editors?

We also need to remember that neither OOXML nor ODF are pure XML formats.  Both formats involve a ZIP container file with multiple XML files and associated resources inside.  So document corruption may consist of damage to the directory or compression structures of the ZIP container as well as errors introduced into the contained XML and other resources.    The directory of the ZIP’s contents is stored at the end of the file.  So the truncation errors are damaging the directory.  However, this information is redundant, since each undamaged ZIP entry can be recovered in a sequential processing of the archive.  So I would expect a near perfect recovery rate for the modest truncations exercised in these tests.  But with OOXML files in Office 2003 and OpenOffice 3.2, even a truncation of a single byte prevented the document from loaded.  This should be relatively easy to fix.

Also, the large number of tests with the “Silently Recover” outcome is a concern.  Although the problem in general is solved with digital signatures, there should be some lightweight way, perhaps checking CRC’s at the ZIP entry level, to detect and warn users when a file has been damaged.  If this is not done, the user could inadvertently work and resave the damaged work or otherwise propagate the errors, when an early warning of the error would potentially give the user the opportunity, for example, to download the file again, or seek another, hopefully, undamaged copy of the document.  But by silently recovering and loading the file, the user is not made aware of their risky situation.

Files and detailed results

If you are interested in repeating or extending these tests, here are the test files (including reference files) in DOC, DOCX and ODT formats.  You can also download a ZIP of the Java source code I used to introduce the document errors.  And you can also download the ODF spreadsheet containing the detailed results.

WARNING: The above ZIP files contain corrupted documents.  Loading them could potentially cause system instability and crash your word processor or operating system (if you are running Windows).  You probably don’t want to be playing with them at the same time you are editing other critical documents.

Updates

2010-02-15: I did an additional 100 tests of DOC and DOCX in Office 2007.  Combined with the previous 30, this gives the DOC files a recovery rate of 92% compared to only 45% for DOCX.  With that we have significant results at 99% confidence level.

Given that, can anyone see a basis for Microsoft’s claims?  Or is this more subtle?  Maybe they really meant to say that it is easier to recover from errors in an OOXML file, while ignoring the more significant fact that it is also far easier to corrupt an OOXML file.  If so, the greater susceptibility to corruption seems to have outpaced any purported enhanced ability of Office 2007 to recover from these errors.

It is like a car with bad brakes claiming that is has better airbags.  No thanks.  I’ll pass.

  • Tweet

Filed Under: ODF, OOXML

Asking the right questions about Office 2010’s OOXML support

2009/11/17 By Rob 19 Comments

There is more OOXML controversy in the news, this time in Denmark. I don’t claim to understand all the nuances of the accusations, since I don’t read Danish, and Google Translates makes it sound at times like a discussion about loaves of rye bread or something, but the gist of it, as I can surmise from this account, is whether Office 2010 will “support the complete ISO-approved version of OOXML”.  Microsoft’s spokesperson says it will.  Mogens Kühn Pedersen, chair of the Danish Standards Committee, says it will not.

This is the kind of dispute where you can go around in circles with for days and not reach agreement. The problem is they are arguing over words, not facts, and they do not agree perfectly on the meaning of the words. Words like “support” and “complete” and “conform” are used in different ways, with different meanings and intents.

Let’s try to escape the equivocation and instead try to establish the underling facts. I can’t promise that this will clarify the situation any. In fact I suspect we’ll end up even more confused about what exactly Office 2010 actually supports. But replacing a false certainty with an honest uncertainty is progress of a kind. It gives us something we can build on.

First, we need to acknowledge that OOXML entered ISO as one standard, and was transformed, via the BRM and ISO ballot, formally into 4 standards, ISO/IEC 29500 Parts 1, 2, 3 and 4. Within these parts are are several different conformance targets and conformance classes. In particular, these 4 standards encompass two different and incompatible schemas for many of its features: “Strict” and “Transitional”. What Microsoft submitted in the Fast Track is essentially the “Transitional” schema. What was created by the BRM was the “Strict” schema. This is where Microsoft made most of its “concessions” in order to turn “No” votes into “Yes” votes. So things like support for spreadsheet dates before the year 1900, the elimination of VML graphics, etc., these are all in the “Strict” schema. All the legacy “DoItLikeWord95” garbage was in “Transitional” only. Several NBs voted to approve OOXML because the assertion that “Transitional” would not be written in documents produced by future versions of MS Office. The promise was that it was…well…transitional, for moving legacy binary documents into XML. Few people want to support two different document standards (both ODF and OOXML) in the first place. But to require support for two different and incompatible versions of OOXML — that is simply intolerable.

In any case, because of these two conformance classes, anyone who claims that their product supports “OOXML” in an unqualified sense, without stating which conformance target or conformance classes they are supporting, is not stating anything of substance. It is like trying to buy an electrical plug adapter by just saying “I need electricity”. Merely saying “conformance to OOXML” means nothing. You need to state the conformance targets and classes that you support. Remember, the conformance language of OOXML is so loose that even a shell statement of “cat foo.docx > /dev/null” would qualify as a conformant application. I assume that Office 2010 supports at least that.

Of course, the alleged assertion that Office 2010 supports OOXML “completely” is a bit more problematic. What exactly does this mean? Does this mean that Office 2010 supports all conformance classes and targets of all four parts of OOXML? Including being a Strict consumer? A Strict producer? That would be a good thing, IMHO, if it were true. But that is not what ISO/IEC JTC1 SC34/WG4 was recently told in Seattle, where they were told that Office would not write out Strict documents until Office 16. That would put it out to the middle of the next decade, assuming the typical 3-year Office release schedule.

So I’ll lay out my assertions (with the caveat that Office 2010 is not complete and shipped yet) as:

  • Office 2010 will conform to the Transitional consumer and producer classes defined in the OOXML standards. Any bugs that are found in the shipped version of Office 2010 will be “fixed” by retroactively changing the standards to match what Office actually does, as is currently being done by Microsoft-packed SC34/WG4 committee with similar bugs found in Office 2007’s OOXML support.
  • Office 2010 will not have conforming support for OOXML Strict producer or consumer classes.
  • Office 2010 will write dozens of non-interoperable, proprietary extensions into their OOXML documents, extensions which are not defined by the OOXML standards and which have not been reviewed or standardized by any standards committee and which will not be fully interoperable with other OOXML editors, or even with previous versions of MS Office.

So instead of arguing over the meaning of “support” and “complete” I suggest some alternate questions for Microsoft, to give them the opportunity to clarify exactly what kind of support for OOXML will be coming in Office 2010:

  1. Exactly what ISO/IEC 29500:2008 conformance classes and targets will Office 2010 conform to?
  2. Is this contingent on first changing the conformance requirements of the published ISO/IEC 29500:2008 standards to match what Office 2010 actually supports? Or is there a commitment to support the published standards as they was approved by JTC1 national bodies? In other words, is Microsoft committed to conform to the standards, or are we back to changing the standards to “conform” to Microsoft?
  3. Will Microsoft Office 2010 write out only markup that is fully described in the OOXML standards? Or will it write out proprietary markup extensions that are not fully defined in the standards? In other words, will Office 2010 be “strictly conformant” with the ISO/IEC 29500:2008 standards?

The problem you run into here is that there are really two different OOXML standards: the new and improved OOXML Strict conformance class, the one that was “sold” to ISO NBs, the one that garnered the approval votes, and then the old ugly one, the “haunted” specification, the Transitional conformance class, supported only by Microsoft Office. Anyone considering adopting OOXML should have perfect clarity as to which one they are adopting, especially since these are two very different standards, both formally and logically. Just as it is problematic to speak about OOXML support in a product without stating which conformance classes and targets are supported, it is equally a defect of any adoption policy to be loose in what version of OOXML is being proposed for adoption.

IMHO, if you must state a requirement for OOXML (along with ODF), at least specify it clearly, and state a requirement for “strict conformance” (meaning no extensions) of the Strict conformance classes of ISO/IEC 29500:2008. To do otherwise is to essentially specify a requirement for the use of Microsoft Office and Microsoft Office alone.

  • Tweet

Filed Under: OOXML

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 23
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2023 Rob Weir · Site Policies

 

Loading Comments...