• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for OOXML

OOXML

Amusing but Confusing

2007/01/20 By Rob 12 Comments

I’ve always been annoyed by Microsoft’s choice for a name in their “Office Open XML”. It isn’t the wishful use of the work “open” that bothered me. It was that the name just doesn’t roll off the tounge easily. It always seems to get stuck someplace and comes out wrong. You need to think harder to say “Office Open XML” and have it come out right.

“Open” is an adjective, and in English adjectives are usually placed before nouns, not in the middle of a noun phrase. We say, a “black guard dog”, not a “guard black dog”. When you fight language, language usually ends up winning. So it is not surprising that what comes out is “Open Office XML” by mistake.

I’m obviously not the only one with this problem. A quick Google for “Microsoft Open Office XML”, or “Ecma Open Office XML”, phrases that should get zero hits, reveals instead an embarrassment of riches. Everyone gets this wrong.

ZDNet’s David Berlind:

Yesterday, when Novell announced that one of the first fruits to be born out of its newly minted legal relationship with Microsoft would be a plug-in to OpenOffice.org that would allow the open source based office suite to open or save documents in Microsoft’s Open Office XML (OO-XML) file format, I had a tough time parsing through the text of the company’s press release.

Redmonks’s Stephen O’Grady with an article titled “Microsoft Open Office XML Formats / Open Document Format Follow Up”.

CRN: Reseller Channel News with a headline, “Ecma says Yeah to Microsoft Open Office XML“.

Computer Business Review:

Corel Corp, developer of the WordPerfect suite, announced last week that it will support both ODF and Microsoft’s Open Office XML format.

XMLMind, a tool designed to work with OOXML gets it wrong:

Thanks to new XMLmind FO Converter v4, it is now possible to convert XML documents to Open Office XML (.docx) the native format of MS-Word 2007.

BusinessWeek proof-readers missed this error:

…Microsoft is working hard to defeat it and promote its own XML-based file format–called Microsoft Open Office XML. This will be the default file format in Office 2007, due out late this year.

Even Microsoft Press Releases make this error:

‘Through the XXX Alliance, we are working closely with Microsoft to increase data access across our instrument systems and data analysis software tools using Ecma Open Office XML,’ said XXX, president of XXX.

Even Microsoft’s blog profile for a member of their own Corporate Standards Team, an OOXML expert, gets it wrong:

Dave is a member of Microsoft’s Corporate Standards policy team. He is involved with all of Microsoft’s global standards around server & tools which includes everything from XML to WS-*, from W3C to Oasis and ISO, all Office standards including Open Office XML, and all vertical industry standards from the enterprise markets to Microsoft Dynamics products

This guy works on Office Open XML and he doesn’t even get it right!?

Microsoft’s own OOXML overview page on the file formats can’t get it right:

By installing a simple update, users of Microsoft Office 2000, Microsoft Office XP, and Office 2003 Editions can open, edit, and save documents in one of the Ecma Open Office XML File Formats.

Ditto for Microsoft’s FAQ page on the file formats:

The Ecma Open Office XML Formats will offer some key improvements over the binary file formats in use today within Word, Excel, and PowerPoint. Because these new file formats are compressed, the resulting document sizes will be much smaller, somewhere between 50 and 75 percent smaller in some cases.


A recent article by Microsoft’s Platform Strategy Manager in Australia got it wrong in the title: Streamlining your documents with Open Office XML.

And to top it all off, Bill Gates himself gets it wrong, then corrects himself, as seen in Molly Holzschlag’s transcript from a recent blogger outreach event she attended at Microsoft headquarters in Redmond:

But every year for 13, 14 years now we’ve not just followed and implemented standards, we’ve contributed. This WS stuff, . . . we contributed more Web standards than anyone! We have our smartest people who go and work on that stuff . . . we just did the OpenOffice . . . our office XML formats we contributed to them . . . we’ve got XML at the core of all our products.

(Thanks to Yoon Kit from Open Malaysia, who has also been taking a closer look at the names used inside OOXML, for pointing out that quote.)

I’m not meaning to embarrass anyone with the above quotes. Those who have heard me speak on Office Open XML know that I struggle to get that name out every time, and do not always succeed. Like I said before, if you fight language, you will lose.

So the Ecma standard clearly has a name which causes confusion with the name of an existing application, “Open Office”, which happens to also be the most prominent implementation of OpenDocument Format, the ISO standard for office documents. OpenOffice.org is a registered trademark (check the Tess database for the actual registration) and has been used in the trade since 2001 for describing a application used for database management, spreadsheet, word processor and presentation graphics.

I am not a lawyer, but from reading a BitLaw writeup on trademark infringement, it appears that the thing to prove is “likelihood of confusion”, and the factors the courts would look at include evidence of actual confusion by consumers and similarity of the marketing channels for the two products.

In any case, to have an ISO standard that, by its aberrant use of the English language, almost compels users to transform it into “Open Office XML” will only confuse users. This is not just my prediction. It is my observation, backed up by many specific examples of how this confusion is happening even now. I invite you to comment on other examples you may know of.

Early last year, another Microsoft/Ecma was submitted to JTC1 for approval under Fast Track rules. It was Microsoft’s C++/CLI specification. During the 30-day contradiction review period national bodies raised objections based on the confusing name Microsoft picked for their standard, and the practical problems this caused. GrokLaw had good coverage of this.

A summary of the UK’s contradiction argument is:

In response to document ISO/IEC JTC1 N8037, the UK objects to Fast Track Ballot ECMA-372 1st Edition C++/CLI Language Specification, on the grounds that there is a contradiction with an existing JTC1 standard. ISO/IEC 14882:2003 is the standard for the C++ programming language. Adopting a second standard under the proposed name of C++/CLI will cause unnecessary and harmful confusion in the marketplace.

We consider that C++/CLI is a new language with idioms and usage distinct from C++. Confusion between C++ and C++/CLI is already occurring and is damaging to both vendors and consumers.

A new language needs a new name. We therefore request that Ecma withdraw this document from fast-track voting and if they must re-submit it, do so under a name which will not conflict with Standard C++.

Germany had similar objections:

We propose that the document is input into SC22 as a regular New Work Item Proposal and assigned to WG21 for further processing.

On a technical level, there are some rather different approaches between C++ and C++/CLI which can easily cause considerable confusion when both languages are considered to be “C++” or add unnecessary overhead when trying to write C++ code usable with C++ and C++/CLI.

I suggest a similar objection should be raised with regards to Ecma Office Open XML. It’s name causes confusion with an existing registered trademark. Ecma should rename their standard to something less likely to cause confusion.

Any suggestions for a new name?


Updated on 25 June 2007 to add some additional recent examples of this continuing confusion.

Filed Under: ODF, OOXML

The Vast Blue-Wing Conspiracy

2007/01/20 By Rob 4 Comments

Microsoft’s Brian Jones and Doug Mahugh have put all the pieces together and are expressing their suspicions that all of the troubles OOXML is facing is caused by IBM.

Yikes, we’ve been found out!

The truth can now be told. We have a nine-floor complex beneath Devil’s Tower in Wyoming, Dick Cheney’s home state. We employee three-hundred Oompa Lumpas, ostensibly here on student visas, to read through the 6,000 page OOXML specification. They then input their concerns into a massively parallel computer, based on the old Deep Blue chess computer that beat Gary Kasparov. The computer takes the objections, formats them into English, inserting random literary quotes from The Modern Library of the World’s Best Books, and then posts them in blogs and press articles. The computer can express these objections in the form of sonnets, haikus, or even as crude limerick. Every year on January 14th (Thomas J. Watson’s Birthday) at 3:14am the Oompa Lumpas come to the surface, smear their bodies with blue paint, dance around a bonfire, howl at the moon and entreat the gods to vanquish their foes, mainly Microsoft, who canceled their favorite application, Microsoft Bob. Rob Weir doesn’t really exist. He is just a subroutine. As they say, “On the internet, nobody knows your are a subroutine processing data input by Oompa-Loompas working for IBM underground in Wyoming”

I guess that’s one theory.

But from what I’ve seen of the world, when you think everyone is out to get you, it is usually one of three things:

  1. You are mentally ill
  2. You are doing something stupid and people are trying to help you
  3. You are in a movie

I’d suggest #2 is the more likely explanation. But a 4th possibility, one I had not thought of, is hinted at in the latest Dr. Dobbs, in an article by Michael Swain entitled “Microsoft Loves Linux: What’s With That?”. The article focuses on the recent Microsoft-Novell deal, but there is an interesting observation that applies to the format discussions as well:

Then there’s the PR angle. In Microsoft’s case, PR includes trying to look virtuous to the EU courts. Look, Microsoft can say, at how we play nice with competing platforms like Novell’s SUSE. Here’s a tin-foil-hat theory: Microsoft can’t compete against a movement, Ballmer has acknowledged. It can definitely compete against a company. So isn’t it likely that this question has come up at Microsoft: Can’t we somehow turn this Linux movement into a company that we can compete with?

Can the same be said about file formats? It is hard for Microsoft to beat a movement, so it attempts to turn this into a battle against a single company.

Let’s look at the facts:

ODF is not controlled or promoted by a single company. ODF is developed in OASIS with a Technical Committee (TC) that includes members from a number of vendors, including Adobe, Novell, Intel, Sun and IBM. The TC also includes unaffiliated individual members, representatives from various open source projects, as well as members from the OpenDocument Foundation and other non-profit organizations.

The Foundation in particular has brought a huge amount of talent and resources to the development of ODF. Traditionally, standards were developed exclusively by large corporations, and individuals and smaller players were marginalized. But the world is different today. The Foundation has shown that with a bit of organizational skill, individual volunteers can band together and have a voice and technical contribution on par with long-established corporations. They should be given much credit for this.

On the promotion side ODF is promoted by groups including the ODF Adoption TC, the Open Document Format Alliance, the OpenDocument Fellowship and the previously mentioned OpenDocument Foundation. The Adoption TC manages the ODF portal on XML.org and is currently working on various journal articles, whitepapers and responding to CfP’s for various conferences and symposia this year. I’ve lost count of how many companies are members of the ODF Alliance. I stopped counting when it went over 300. If you are not on their mailing list, then you should be. The Fellowship has also done amazing work promoting ODF and developer tools related to ODF.

So let’s put to bed the conspiracy theories that this is all just IBM out to get Microsoft. ODF is far more than one company. IBM does not own ODF or control ODF or control the groups that promote ODF. Those who say otherwise discredit the efforts of the many of volunteers who have worked so hard to develop the ODF standard and implement it in so many applications.

Filed Under: ODF, OOXML

A Foolish Inconsistency

2007/01/18 By Rob 8 Comments

Ralph Waldo Emerson’s memorable words from his 1841 essay, “Self-Reliance”:

A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. With consistency a great soul has simply nothing to do. He may as well concern himself with his shadow on the wall. Speak what you think now in hard words, and tomorrow speak what tomorrow thinks in hard words again, though it contradict every thing you said to-day. ‘Ah, so you shall be sure to be misunderstood.’ Is it so bad, then, to be misunderstood? Pythagoras was misunderstood, and Socrates, and Jesus, and Luther, and Copernicus, and Galileo, and Newton, and every pure and wise spirit that ever took flesh. To be great is to be misunderstood.

These are fine words for a philosopher, but based on those statements I have my doubts as to whether Emerson would have made a good engineer or businessman.

Where systems are designed for multiple parties to collaborate we must have consistency driven by shared standards.

The first shared standards date back thousands of years and supported mankind’s earliest commercial ventures:

  • Uniform weights and measures, so you knew you were getting what you paid for
  • Official coinage of specified weight and purity, so you knew what was being paid
  • A working language for recording treaties and trade agreements

This was such an obviously good thing that standards existed already at even the earliest limits of our written historical record. In the Flood stories, in both the Old Testament and the Gilgamesh, the authors thought it appropriate to give the exact dimensions of the vessels. When God spoke to Noah, and Enki spoke to Atrahasis, He spoke in the language of the standards of the day.

As civilization progressed, standards took an increasingly larger role. As the railroad, the steamship and the telegraph shrunk the size of nations and oceans, the speed of communications and commerce increased, leading to such diverse standards as railroad gauges, time zones and international postage. Moving into the information age, the increased speed and variety of communication lead to standardized network protocols, media formats and character encodings.

Generally, standards are necessary whenever two or more parties communicate or exchange goods or services.

A look at the US/Chinese Standards Portal, a joint effort between ANSI and SAC, shows the breadth of standards that specify the properties of materials and products we all use every day. Their tag line is, “The international language of commerce is standards”. I concur.

But progress has been uneven. Although I can send an email message anywhere in the world, make a phone call anywhere in the world, send a letter anywhere in the world, and expect that it will be received and read exactly as I intended, formatted documents, spreadsheets and presentations have lacked this level of interoperability. One person uses Word, another person uses WordPerfect, another person uses AbiWord or OpenOffice or WordPro. Older documents might still be in WordStar, XYWrite or Manuscript format. We tried conversions, importing and exporting to various formats for interchange, like RTF and CSV. It worked sometimes, but not always and certainly not well.

How did we get to such chaos in the area of document formats?

It is notable that these applications were designed and their formats defined before widespread commercial use of the Internet. The business user of a word processor circa 1994 shared documents via hard copies, or electronically with only users on their LAN. The facilities for electronic document sharing between business partners, between a company and their customers, or a government and its citizens were not widespread. So company A might use WordPerfect, and company B might use WordStar, but since they didn’t exchange documents, or only did so via hard copy, there was no file format problem.

With the popularization of the Word Wide Web and increased connectivity of businesses to the Internet, another jump forward in the rate of communications took place, comparable to the railroad or the telegraph. The world was now a very small place indeed. This lead to a parallel acceleration of the rate of commerce, as new opportunities arose for supply chain integration, advertising, education, online exchanges, outsourcing, and the new business models that are invented every day.

Today, the document you create can instantly be transported around the world. You may not know who reads your document, what operating system they are running or what applications they are using. They may be running Ubuntu on a laptop on the beach, or a Symbian-enabled mobile phone i rush hour traffic, or even using a screen reader or other assistive technology to render the document according to their needs. We no longer exclusively buy, sell or support the person in the bricks and mortar office down the street. Commerce is global, it is instant, and it is based on standards.

This is where OpenDocument Format (ODF) comes in. After a 15 years of chaos in office document formats, it was time for a standard. The rate of communications and commerce demands it. More importantly, customers demand it.

The complaints I hear about the prior state of affairs revolve around these issues:

  • I want to own my data.
  • I do not want access to my data controlled by a single commercial entity.
  • I do not want to require that people go out and purchase a particular application in order to read my documents.
  • I want my documents to be in a format that has long-term stability and understandability
  • I want my documents to be in a format that lends itself to processing by a range of tools, both commercial and free.
  • I want my documents to be a format that everyone can understand.
  • I want to break out of the cycle of having to constantly upgrade my software every time my vendor decides to change formats on me

ODF had its roots in the OpenOffice.org project, was refined in an OASIS Technical Committee and then reviewed and approved by ISO last May. It took almost three years to edit, review and approve the specification, but the results are worth it. Today every major word processor either now implements ODF or has announced plans to do so.

But not everyone is happy with progress. This has always been true. The last Pony Express rider likely cursed at the mere mention of the telegraph. The last DECnet engineer likely mumbled, “Why would anyone want a TCP/IP?” as he packed his belongings and cleaned out his office. And in the realm of document formats, Microsoft is kicking and screaming to try to delay the inevitable widespread adoption of ODF as a document format for everyone.

Why is Microsoft so upset?

The answer is, they enjoy a monopoly in office applications and they know that if users could easily move away from Microsoft Office while preserving access to their documents, then users would leave by the millions. The Fear, Uncertainty and Doubt (FUD) around file formats and fidelity and compatibility is the way Microsoft ensures their lock-in.

Let’s review some history of Microsoft and their office file formats, to get a better sense of how this game is played.

Let’s go back to the early days, the mid 1990’s, when Microsoft did not have such market dominance, back when they had competition in the word processor and spreadsheet market. At that time Microsoft actually documented their file formats. Sure, the specification was incomplete, but it was an honest attempt. You could buy the Excel format in book form from Microsoft Press, or get an electronic version of the Excel and Word formats on an MSDN CD. At one point it was a free download from the MSDN web site.

But around 1999 something happened. The license on the file format specification changed. Where before you could do anything you wanted with the formats, now the specification carried the explicit restriction (my emphasis):

[Y]ou may use documentation identified in the MSDN Library portion of the SOFTWARE PRODUCT as the file format specification for Microsoft Word, Microsoft Excel, Microsoft Access, and/or Microsoft PowerPoint (“File Format Documentation”) solely in connection with your development of software product(s) that operate in conjunction with Windows or Windows NT that are not general purpose word processing, spreadsheet, or database management software products or an integrated work or product suite whose components include one or more general purpose word processing, spreadsheet, or database management software products.

So, file format documentation that was once freely available was restricted to applications that ran on Windows and which did not complete with Microsoft Office.

Soon after this file format information was removed from MSDN altogether. It was only available under a licensing program that had even further restrictions:

This program entitles qualified software developers to license the Microsoft .doc, .xls, or .ppt file format documentation for use in the development of commercial software products and solutions that support the .doc, .xls, or .ppt file formats from Microsoft and to complement Microsoft Office

(How should we parse this? What does it mean to “complement” Microsoft Office? I think in ordinary use, an application that competes against Office would not be considered complementary.)

So what happened between 1995 and 2004 to cause Microsoft to wipe out every bit of publicly-available documentation on their file formats? It seems to me that the main change in that time frame was that they wiped out the competition. The earlier availability of the file format documentation seems to have been in order to encourage developers and partners and those days, Excel was good about documenting their file format, and importing and exporting competing formats like 1-2-3.

Joel Spolsky, talking about what was required for Excel to reach its “tipping point” in adoption, explains it this way:

The mature approach to strategy is not to try to force things on potential customers. If somebody isn’t even your customer yet, trying to lock them in just isn’t a good idea. When you have 100% market share, come talk to me about lock-in. Until then, if you try to lock them in now, it’s too early, and if any customer catches you in the act, you’ll just wind up locking them out. Nobody wants to switch to a product that is going to eliminate their freedom in the future.

But we see that, as their monopoly was achieved, Microsoft throttled the availability of the Office file format specifications until they was no longer available to potential competitors. The lock-in has been achieved; the door slams shut.

This shows the strategic value of file formats to Microsoft and the steps they have been willing to take in order to keep users locked onto the Windows/Office platform.

So now, today, Microsoft is pushing their Office Open XML standard, “old wine in new wine skins”, not so much a new format as a new ploy. What should enrage every thoughtful person is that they are using compatibility with the legacy binary formats as the main selling point of the OOXML format. Think about it. Compatibility with the binary format that they withdrew from the public seven years ago when they cemented their monopoly, is now being touted as their unique advantage. Said differently, Microsoft is selling OOXML as the solution to an interoperability problem that they themselves created and carefully orchestrated.

I’m obviously not a fan, as regular reads of this page already know.

So what prevents Microsoft from doing the same thing again? How do we know that the next version of Office will use a format that is an open standard? Office 2007 has already extended OOXML in undocumented ways to support things like macros and DRM. Although they cannot withdraw the OOXML specification from Ecma, they can surely just ignore it, not update it, and continue to extend their format in undocumented ways. Since the success of ODF is the only reason they are pushing OOXML, it would be in true character for them to deemphasize standard OOXML as soon as ODF is wiped out, and turn it back into an in-house proprietary format, only disclosed to those who agree not to compete with them.

The time is right for a single document standard and that standard is clearly ODF. The opportunity is here for ISO/IEC JTC1 to send a resounding message in favor of interoperability and consistency and to reject OOXML as contradicting the existing ISO ODF standard. I don’t have a lot to say here about the various technical/legal contradiction arguments behind this. (This post is too long already) If you want the details, they are covered in depth on GrokLaw and ConsortiumInfo. In particular I’d draw your attention to the two Wiki pages (here and here) mentioned in the GrokLaw piece where, if you are so inclined, you can help research and explain the technical reasons why OOXML should be rejected. I certainly plan on contributing to that effort.

I believe we can win this one. The forces of vendor lock-in and secret proprietary interfaces and formats are vulnerable. They have overplayed their cards and are pushing a specification which will only cause them embarrassment once its contents are better known. They are losing market share as well as mind share. They are the past. One last big shove and we should be able to topple the tower. All together now….Push!

Filed Under: OOXML

Calling Captain Kirk

2007/01/16 By Rob 9 Comments

I suppose I was the odd child in my neighborhood. While the other boys were playing with light sabers and phasers, I wanted only one thing from the future: the Universal Translator, the ultimate piece of linguistic technology which would immediately translate from all alien tongues. Captain Kirk had one, and one day, I vowed, I would have one as well. Who wouldn’t want one? It certainly beats spending hours memorizing vocabulary and conjugations and declensions.

Flash forward now to the 21st Century, the present. We have Babelfish and Google, and they do a fair job at text translation, but the Universal Translator is still science fiction.

Or is it?

The Ecma Office Open XML (OOXML) specification seems to presuppose the existence of a Universal Translator of sorts. Take a look at section 11.3.1 “Alternative Format Input Part” (Page 38):

An alternative format import part allows content specified in an alternate format (HTML, MHTML, RTF, earlier versions of WordprocessingML, or plain text) to be embedded directly in a WordprocessingML document in order to allow that content to be migrated to the WordprocessingML format.

According to the schema, these alternate formats may be the main content of the document, or specifically applied to comments, endnotes, footer, footnotes or headers.

Let’s parse the original more closely, starting by defining some terms:

  • The term “part” in OOXML refers to the individual items (XML documents, images, scripts, other binary blobs, etc.) contained in the OOXML Zip file, which they call a “package”. So a package is made up of one or more parts.
  • HTML should be self-evident. But does this also include the HTML-like output from earlier versions of Word, which wasn’t always well-formed?
  • MHTML what you get when you save a “complete web page” within Internet Explorer. It is MIME-encoded version of the HTML page plus the embedded images. MHTML is listed as a having a status of “Proposed Standard” in the IETF, but it appears to have been held at that state since 1999. (Does anyone know why it never advanced to the Standard status?)
  • RTF – Rich Text Format is a proprietary document format occasionally updated by Microsoft. As one wag quipped, “RTF is defined as whatever Microsoft Word exports when it exports to RTF”.
  • WordProcessingML – I’ve seen this term used to refer to the XML format of Word 2003 as well as Word 2007. Presumably the 2003 version is intended here?

As you can see, we have several problems here from a specification standpoint.

First, no versions are specified for HTML, MHTML, RTF or WordProcessingML. Are we supposed to support all versions of of these? Only some? Does this include WordProcessingML from beta versions of Office 2007 as well?

Second, the specification provides no normative references for MHTML, RTF or “earlier versions of WordProcessingML”.

Third, this is a closed list of formats that seems biased toward Microsoft’s legacy formats. Why not XHTML? Why not DocBook? Why not TeX or troff? Why not ODF? Is there a legitimate reason to restrict the set of supported formats in this way?

Fourth, “plain text” is not a phrase I like to see in file format specification, since it is undefined. No encoding is mentioned. What is meant here? ASCII, Latin-1, UTF-8. UTF-16, EBCDIC? Some of the above? All of the above? What encodings are included under the name “plain text”?

Reading further we have:

A WordprocessingML consumer shall treat the contents of such legacy text files as if they were formatted using equivalent WordprocessingML, and if that consumer is also a WordprocessingML producer, it shall emit the legacy text in WordprocessingML format.

Three words should raise an eyebrow. The first is the use of the word “equivalent” and the other two are the instances of the word “shall”. “Shall” is spec talk for a requirement, something a conformant application must do. According to Annex H of ISO Directives Part 2, “Rules for the Structure and Drafting of International Standards”, the word “shall” is used,“to indicate requirements strictly to be followed in order to conform to the document and from which no deviation is permitted.”

So, compliant consumers are required to take input from a variety of formats and convert them in the “equivalent” WordProcessingML. Putting aside the question as to what version or versions of HTML are intended, there is nothing here that defines the mapping between any version of HTML and WordProcessingML. So the conversion is application-defined. Considering that this is indicated to be a required feature of a conformant application, I find the lack of specificity here disturbing. How can there ever be interoperable processing of OOXML documents if this is not defined?

Reading the OOXML specification a little further down:

This Standard does not specify how one might create a WordprocessingML package that contains Alternative Format Import relationships and altChunk elements.

However, a conforming producer shall not create a WordprocessingML package that contains Alternative Format Import relationships and elements.

“Shall not” is another one of the special specification words. So, essentially, we’re not allowed, in a conforming application, to create a document with Alternative Format Input Parts, but if we read a document that has one, then we are required to process it, transforming it into equivalent WordProcessingML.

Further, we get this informative note:

Note: The Alternative Format Import machinery provides a one time conversion facility. A producer could have an extension that allows it to generate a package containing these relationships and elements, yet when run in conforming mode, does not do so.

Putting on my tinfoil hat for a moment, I find this all rather fishy. The OOXML specification, at 6,000+ pages has now just sucked in the complexity of one or more versions of HTML, MHTML, RTF and WordProcessingML. It requires that a conformant application understand these formats, but forbids a conformant application from producing them.

This is another example of how you never know what you’re getting when you get an OOXML file. To support OOXML is not to support a single format, or even a single family of formats. To fully support OOXML requires that you support OOXML plus a motley hodgepodge of various other formats, deprecated, abandoned and proprietary. The cost of compatibility with billions of legacy Microsoft documents is that you must support their legacy of years of false starts and restarts in the file format arena.

When you get an OOXML document, you don’t know what is inside. It might use the deprecated VML specification for vector graphics, or it might using DrawingML. It might use the line spacing defined in WordProcessingML, or it might have undefined legacy compatibility overrides for Word 95. It might have all of its content in XML, or it might have it mostly in RTF, HTML, MHTML, or “plain text”. Or it may have any mix of the above. Even the most basic application that reads OOXML will also need to be conversant in RTF, HTML and MHTML.

Captain Kirk, where are you? I need a Universal Translator!

Filed Under: OOXML

Guillaume Portes Redux

2007/01/14 By Rob 12 Comments

My post of 10 days ago, How to hire Guillaume Portes, received quite a bit of attention, with over 50,000 page views, links from 25 blogs, and around 300 comments left by visitors to this blog, Slashdot and the Joel on Software discussion group. I’d like to thank all that took the time to read and to comment.

It is good to continually tell the story and make the case. Having two standard file formats for office documents would be a bad thing for commerce, for end users and for the industry. With two formats, end users will be confused and costs will be higher for those who sell and buy software that works with documents. This will essentially cause a frictional drag on the document processing market. Sure, there will be those who will benefit from the chaos, just as there are those who benefit from the friction of currency exchanges. But over the years we’ve learned the value of things like uniform commercial codes, currency unions and uniform trade regulations.

I’ve heard no one complain about having lost their freedom of choosing the Mark over the Lira over the Franc. We simply use the Euro and then concentrate on what we are buying or selling, not on the currency. In a similar way we should agree on a single document format and then concentrate on application features and user needs and what we are trying to communicate, and stop worrying about file formats. When done well, file formats are invisible. They are not seen by end users, are not discussed by the press, and not thought about by (most) engineers. The fact that I’m writing about OOXML at all and not about my wine making exploits is an aberration caused by the failure of the dominate market player to provide an open document standard that allows users to own their documents.

But I digress…

Now that I’ve finished reading all of the comments, I’d like to review with you some of the better ones, pro and con, along with my commentary.

Let’s start.

I don’t know how many of you noticed: The fictional name “Guillaume Portes” is actually a literal translation of “Bill Gates” in French.

If you noticed this as well, give yourself 3 extra points. Many of my posts have a secret joke, and I hope these will bring a smile to those who find them.

Here’s comment questioning whether there is a problem with OOXML:

I haven’t looked at the spec, so I don’t know how good or bad it is. But the examples he cites don’t strike me as such a problem. They’re all just to maintain backward compatibility with documents from old versions of Word and other apps. You would be free to ignore them if you don’t need that compatibility. I’m not sure how else they could have done it.

A similar view was expressed by another reader:

I don’t know if it has been stated here, but you do know that supporting these the compatibility options is not required for OpenXML compliance? Developers are free to leave these out of they want.

By that same argument developers can also leave out text alignment, images and tables, since these features are not required for compliance either. In fact, everything in OOXML is optional. If you read the compliance definition in the OOXML specification, it comes down to this statement in Section 2.5:

Application conformance is purely syntactic…A conforming consumer shall not reject any conforming documents of the document type expected by that application. A conforming producer shall be able to produce conforming documents.

Given this definition of conformance, a fully conformant OOXML application can be as simple as:

cp foo.docx bar.docx (Linux) or copy foo.docx bar.docx (Windows)

In the end, regardless of whether a feature is optional, or even deprecated, if that feature occurs in real OOXML documents, then an OOXML application that aspires to be used and have viability, either commercially or as open source, will need to support it. It is that simple.

There is only one OOXML specification and to an end user all OOXML files are equivalent and interchangeable. The user who receives a document via email, from a government web site, from a colleague, friend, teacher, etc., doesn’t know whether the document was created in Word, created in OpenOffice, created from scratch in Office 2007, saved from Office 2000 with the Compatibility Pack or whether the document was originally authored in WordPerfect and made it into OOXML format only after migration over years via various Office upgrades. It is a DOCX file and users will expect that applications that claim OOXML support will work with their DOCX file, period. Anything else is a support nightmare. That is the entire point of a standard — interoperability — so we must judge OOXML by how well it facilitates that function.

Here is another comment with a view expressed by others as well:

Someone needs to tell every developer of word processing and page layout software on the planet to abandon the ‘must look the same’ obsession described by the above. Why worry about making content in application B look like content in Application A? I create books out of Word files submitted by several people. The last thing I want is all the inconsistent formatting from each of them to control a book’s look.

Named styles is the answer. If a paragraph is body text, call it that. If it’s an inset quote, call it a quote. If a term is in italics, label it as italicized style not Times Italic 12 point. But don’t get all hung up in the distinctions between Times Roman and Times New Roman. The purpose of XML is to define what something is. Not what someone thought it ought to look like on Tuesday three weeks ago.

My personal views are very much in alignment with these sentiments. I think WYSIWYG has done more bad than good over the years, and that strict separation of content, layout and styles should be maintained. However, I also know that my personal views are not universally held, and that the word processor has evolved over the years to be a flexible, multi-paradigm tool that can support both structured document editing as well as looser, ad-hoc editing by users who just need to grind out a memo. A document format for a modern word processor must support both uses.

I’m glad someone brought up the core question:

Considering the requirement that the standard allow for compatibility with existing documents, what would you suggest?

Silently altering documents that are converted into OpenXML?

Disallow automatic conversions whenever a compatibility flag would have otherwise been needed?

One solution approach was mentioned by several users, for which I give two examples:

There is no need to include features from 16 year old (or any age) applications in a new standard. If you want to convert, you convert. If WP6 linespacing is 0.8 of Word2007 linespacing, you write linespacing =”0.8″ in your converted document. You DON’T write useWP6linespacing linespacing =”1″

That is just plain silly. That is making a specification unnecessary large for instances that are rarely used by the general public.

As said: if you want to convert, than use a conversion tool. Do not use a modern specification to hold all legacy features.

And:

Let the plugins do the dirty work of native in-memory-binary representations to XML and back conversions.

Keep the XML file format clean, open, unencumbered, application independent, cross platform, universally transformative and exchangeable, portable and timeless.

I think this is the key point, and I’m gratified that so many readers picked up on it. There is no good reason to have these compatibility flags at all. Instead of having several undefined compatibility flags for legacy line spacing options, we should have a flexible line-spacing model in OOXML and when loading legacy binary documents, convert them as necessary into the line spacing model of OOXML. If the text model in OOXML is sufficiently expressive, this can be done with no loss of fidelity. (And no, a flag that says merely “do it like Word 95” is not an example of expressiveness).

This is what I mean by “generalize and simplify”. A simplified specification is not necessarily less expressive or less capable. A specification is simplified when it supports internal reuse and accomplishes its task with minimal means.

However, if the text model of OOXML is not flexible enough to support even legacy versions of Word, then what hope will the rest of the industry have in adopting it as a format? How will Novell manage with getting OpenOffice to use it, or Corel to get WordPerfect to use it? What about Lotus WordPro? Will Ecma add special compatibility flags to the OOXML specification to account for the quirks of every word processor with legacy documents? What would OOXML look like if we all loaded it up with such legacy flags? Is this the precedent we want to set?

Why should OOXML have special flags for WordPerfect 6.0 (1996) but not have special flags for WordPerfect 12.0 (2004) or the new X3 version (2006)? Is this purely because Microsoft considered WordPerfect to be competitor back in 1996 but now no longer cares? Is this the way to go about designing an ISO standard?

I believe that having compatibility flags in the specification for all word processors in use today is not a practical solution, and that having such flags only for Word and ancient versions of competing products is an approach that benefits only Microsoft.

One last point, since this post is already too long. Microsoft’s Brian Jones is claiming that ODF has a similar issue, in that OpenOffice writes out a number of application-specific settings when it saves a document. This is a good illustration of an important distinction. The items that OpenOffice writes out (you can see an example in Brian’s post) are vendor-defined, document-level application settings. There are now and will continue to be multiple implementations of ODF and it is legitimate that they have application-defined features. These are stored as name/value pairs in a separate XML file in the ODF archive.

I can think of no argument against that. Obviously no interoperability is expected for these vendor specific features, which are for things like application settings like window sizes, zoom factors, print settings, etc. In any case, ODF merely provides a place for applications to store these settings. To blame ODF for any vendor misuse of this feature is like blaming the W3C and HTML for non-standard extensions in Internet Explorer.

OOXML, on the other hand, does not seem to have given much thought to what would be needed in a format that has multiple supporting applications. Only a single application (MS Office) has been explicitly considered, and support for that one application, and its predecessor versions, have been hard-coded into the OOXML schema.

Filed Under: OOXML

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 16
  • Page 17
  • Page 18
  • Page 19
  • Page 20
  • Interim pages omitted …
  • Page 23
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies