Archives for 2007

The Myth Of OOXML Adoption

2007/11/30 By Rob 17 Comments

“Politics aside, there are 400 million users of the Office Open format, and we basically just recognized reality.” This quote by the retired Secretary General of Ecma, Jan van den Beld, explaining why it is so important to standardize OOXML.

Anyone else want to recognize reality? Maybe I can help.

Two questions to consider: 1) What is the actual state of OOXML adoption? and 2) What influence should market adoption of a technology have on its standardization?

On the first question, we should note that the 400 million users figure quoted by vdBeld in no way concerns OOXML. That figure is merely Microsoft’s estimate of the total number of Microsoft Office users, of all versions, world wide. Only a small percentage of them are using OOXML.

Let’s see if we can estimate the number.

How are Office 2007 sales? One (leaked) estimate (in September) was 70 million. But a follow-up statement makes it clear this is total Office licenses sold, of all versions. This is probably on the high end, not indicating installations, or even real end sales, since Microsoft typically reports sales into the channel. So that number must be reduced by some factor to account for real installations.

What percentage of Office users are running Office 2007? Joe Wilcox quotes Gartner, saying “Our Symposium survey showed Office at greater than 10 percent installed base…”

And not every Office 2007 will use the default OOXML formats. I’ve heard that corporate installations are often choosing to change their configuration to default to Compatibility Mode, so that Office 2007 saves in the legacy binary formats, for the increased interoperability this offers.

How does this net out? Something more than 40 million and less than 70 million seems the right neighborhood.

Let’s look for some more data points.

Take the example of OpenOffice, which has has seen over 100 million downloads, not including copies which are included already with Linux distributions. So I believe there are far more OpenOffice users than Office 2007 users. Of course, not all OpenOffice users save in ODF format. Some will change the defaults to use the legacy Microsoft binary formats.

Let’s take a look at an updated version of a chart I made back in May, with data now current through 11/27/2007.

The data here shows the number of documents reported by Google over time for ODF and OOXML documents. Hollow circles are ODF data points; solid circles are OOXML data points. (Yes, I need to figure out how to do scatterplot legends in R) The X-axis does not show the date. That would not be fair, since ODF had a significant head start in standardization and adoption. So in order have a fair comparison, both formats are shown against to the number of “days since standardization”, which is May 1st, 2005 for ODF, and December 7th, 2006 for OOXML, the days the formats were approved by OASIS and Ecma respectively.

Next week is the one year anniversary of Ecma’s approval of OOXML as an Ecma Standard. The news is not good. There are fewer than 2,000 OOXML documents on the entire internet (as indexed by Google at least) and the trend is flat.

What about ODF? Almost 160,000 and growing strongly.

Now we shouldn’t be so careless as to say that there are only 2,000 OOXML document in existence, or for that matter only 160,000 ODF documents. Not all documents are posted on the web. In fact, most of them are sitting on hard drives, in mail files, behind corporate firewalls, etc. The documents that Google sees is only a sampling of real-world documents. But this is true of both ODF and OOXML. My hard drive is loaded with ODF documents that are not included in the above sampling. But however you spin it, the minuscule number of OOXML documents and their pathetic growth rate should be a cause of concern and distress for Microsoft.

Where are all the OOXML documents? What governments have adopted OOXML? What agencies? What major companies? If there was an adoption bigger than a Cub Scout pack we would have heard it trumpeted all over the headlines. Listen. Do you hear anything? No. The silence speaks volumes.

But for sake of argument, what if the numbers were different? What if there were millions of documents on the web in OOXML format? Would that have any relevance to the JTC1 standardization process? The answer is a clear “No”. Market share, or even market domination, is not a criterion. In the US NB, INCITS, we are required to make our decision based on “objective technical factors”. Making a decision to favor a proposed standard because of the proposer’s market share would bring antitrust risks.

Consider this: In JTC1 we vote. One country one vote. We do not vote based on a nation’s GDP. Jamaica and Japan are equal in ISO. We have engineers review the standards. We do not bring in accountants to review financial statements and verify inventories. If we want to make decisions based on market share then we should scrap JTC1 altogether and hand standardization over to revenue department authorities to administer.

But that would then perpetuate a technological neo-colonialism where the developed world controls the the patents, the capital and the standards, and the rest of the world licenses, pays and obeys. There’s the rub. Where standards are open, consensually developed in a transparent process and made available to all to freely implement, there we lower barriers to implementation, level the playing field and allow all nations of the world to compete based on their native genius. But where standards are bought we end up with bad standards and a worse world for it.

PDF, The Waste Land, and Monica’s Blue Dress

2007/11/21 By Rob 8 Comments

Adobe’s PDF Architect, James King, has recently started an “Inside PDF” blog which is well worth subscribing to. I’d especially draw your attention to his post “Submission of PDF to ISO” which has much useful information on the process they are going through in ISO, a process that is slightly different than that used by ODF or OOXML in JTC1. (Note in particular that ISO Fast Track is not exactly the same as JTC1 Fast Track.)

In a more recent post, Archiving Documents, James wonders aloud why anyone would use ODF or OOXML for archiving, compared to PDF or PDF/A, saying, “After all, archiving means preserving things, and usually you want to preserver the total look of a document. PDF/A does that.”

I recommend reading the Archiving Documents post in full, and then return here for an alternate point of view.

.
.
.

We say the word “archive” quite easily and cover a large number of activities by that name, and in doing so risk blurring a number of different activities into one over-generalization. Before you are told that format X or format Y is best for archiving it is fair to ask what is meant by “archiving” and ask who does the archiving, for what purpose and under what constraints.

In some cases what must be preserved, and for how long, is spelled out in detail for you, by statute, regulation or court order. Or, a company, in anticipation of such requests may require preservation as part of a corporate-wide records retention policy for certain categories of employees or certain categories of documents.

An example of the range of materials that may be included can be seen this preservation order:

“Documents, data, and tangible things” is to be interpreted broadly to include writings; records; files; correspondence; reports; memoranda; calendars; diaries; minutes; electronic messages; voicemail; E-mail; telephone message records or logs; computer and network activity logs; hard drives; backup data; removable computer storage media such as tapes, disks, and cards; printouts; document image files; Web pages; databases; spreadsheets; software; books; ledgers; journals; orders; invoices; bills; vouchers; checks; statements; worksheets; summaries; compilations; computations; charts; diagrams; graphic presentations; drawings; films; charts; digital or chemical process photographs; video; phonographic tape; or digital recordings or transcripts thereof; drafts; jottings; and notes. Information that serves to identify, locate, or link such material, such as file inventories, file folders, indices, and metadata, is also included in this definition.
–Pueblo of Laguna v. U.S. // 60 Fed. Cl. 133 (Fed. Cir. 2004).

I would pay particular attention to the part at the end, “…drafts; jottings; and notes. Information that serves to identify, locate, or link such material, such as file inventories, file folders, indices, and metadata”.

Similarly, consider government and academic archives, that are preserving documents for the long-term. The archivist tries to anticipate what questions future researchers will have, and then tries to preserve the document in such a way that it can best answer those questions.

A PDF version of a document answers a single question, and answers it quite well: “What did this document look like when printed?” But this is not the only question that one might have of a document. Some other questions that might be asked include:

What was the nature of collaboration that lead to this document? How many people worked on it? Who contributed what?
How did the document evolve from revision to revision?
In the case of a spreadsheet, what was the underlying model and assumptions? In other words, what are the formulas behind the cells?
In the case of a presentation, how did the document interact with embedded media such as audio, animation, video?
How was technology used to create this document? In what way did the technology help or impede the author’s expression? (Note that researchers in the future may be as interested in the technology behind the document as the contents of the document itself.)

The PDF answers one question — what does the document look like — but doesn’t help with the other questions. But these other, richer questions, will be the ones that may most interest historians.

Let’s take an analogous case. T.S. Eliot’s 1922 poem The Waste Land is a landmark of 20th century literature. Not only is it important from an artistic and critical perspective, but it is also important from a technology perspective — it is perhaps the first major poem to have been composed at the typewriter. What was published was, like a PDF, what the author intended, what he wanted the world to see. That is all the world knew until around 1970, after the poet’s death, when the rest of the story emerged in the form of typewritten draft versions of the poem, with handwritten comments by Ezra Pound.

These drafts provided pages and pages of marked up text that showed the nature and degree of the collaboration between Eliot and Pound far more than had been previously known. This is what researchers want to read. The final publication is great, but the working copy tells us so much more about the process. History is so much more than asking “What?”. It continues by asking “How?” and eventually asking “Why?” — this is where the real insight occurs, going beyond the mere collection of facts and moving on to interpretation. PDF answers the “What?” question admirably. I’m glad we have PDF as a tool for this purpose. But we need to make sure that when archiving documents we allow future researchers to ask and receive answers to the other questions as well.

Flash forward to the technology of today. We are not all writing great poetry, but we are collaborating on authoring and reviewing and commenting on documents. But instead of doing it via handwritten notes, we’re doing it via review & comment features of our word processors. Although the final resulting document may be easily exportable as a PDF document, that is really just a snapshot of what the document looks like today. It loses the record of the collaboration. I don’t think that is what we want to archive, or at least not exclusively. If you archive PDF, then you’ve lost the collaborative record.

Another example, take a spreadsheet. You have cells with formulas and these formulas calculate results which are then displayed. When you make a PDF version of the spreadsheet you have a record of what it “looked like”, but this isn’t the same as “what it is”. You cannot look at the formulas in the PDF. They don’t exist. Future researchers may want to check your spreadsheet’s assumptions, the underlying model. There may also be the question of whether your spreadsheet had errors, whether from a mis-copied formula, or from an underlying bug in the application. If you archive exclusively as PDF, no one will ever be able to answer these questions.

One more example, going back to 1998 and the Clinton/Lewinsky scandal. Kenneth Starr’s report on the case was written in WordPerfect format, distributed to the House of Representatives, whose staff then converted it to HTML form and released it on the web. But due to a glitch in the HTML translation process, footnotes that had been marked as deleted in the WordPerfect file reappeared in the HTML version. So we ended up with an official published Starr Report, as well as an unofficial HTML version which had additional footnotes.

Imagine you are an archivist responsible for the Starr Report. What do you do? Which version(s) do you preserve? Is your job to record the official version, as-published? Or is your job to preserve the record for future researchers? Depending on your job description, this might have a clear-cut answer. But if I were a future historian, I would sure hope that someone someplace had the foresight to archive the original WordPerfect version. It answers more questions than the published version does.

So, to sum it up: What you archive determines what questions you can later ask of a document. If you archive as PDF, you have a high-fidelity version of what the final document looked like. This can answer many, but not all, questions. But for the fullest flexibility in what information you can later extract from the document, you really have no choice but to archive the document in its original authoring format.

An intriguing idea is whether we can have it both ways. Suppose you are in an ODF editor and you have a “Save for archiving…” option that would save your ODF document as normal, but also generate a PDF version of it and store it in the zip archive along with ODF’s XML streams. Then digitally sign the archive along with a time stamp to make it tamper-proof. You would need to define some additional access conventions, but you could end up with a single document that could be loaded in an ODF editor (in read-only mode) to allow examination of the details of spreadsheet formulas, etc., as well as loaded in a PDF reader to show exactly how it was formatted.

Document Format FUD: A Guide for the Perplexed

2007/11/18 By Rob 8 Comments

I’ve decided to put together a list of misconceptions that I hear, generally on the topic of document formats. I’ll try to update this list to keep it current, with the most recent entries at the top. Readers are invited to submit the FUD they observe as comments, and I’ll include it where I can.

This inaugural edition is dedicated to the fallout from the recent supernova we know as the OpenDocument Foundation, that in one final act of self-immolation swelled from obscurity to overwhelming brilliance, but then slowly faded away, ever fainter and more erratic, little more than hot gas, the dimming embers no longer sustainable.

Q: Now that the originator and primary supporter of OpenDocument Format has ended its support for ODF, does this mean the end for the ODF standard? (18 Nov 2007)

A: This question is based on a mistaken premise, namely that the OpenDocument Foundation was the originator or steward of the ODF standard. This is an erroneous notion.

The ODF standard is owned by the OASIS standards consortium, with over 600 member organizations and individual members. The committee in OASIS that that does the technical working of maintaining the ODF standard is called the OpenDocument TC. It has 15 organization members as well as 7 individual members. Until recently the OpenDocument Foundation was a member of the ODF TC, one voice among many.

The adoption of the ODF standard is promoted by several organizations, most prominently the ODF Alliance (with over 400 organizational members in 52 countries), the OpenDocument Fellowship (around 100 individual members) and the OpenDoc Society (a new group with a Northern European focus, with around 50 organizational members). To put this in perspective, the OpenDocument Foundation, before it changed its mission and dissolved, had only 3 members.

When you consider the range of ODF adoption, especially in Europe and Asia, the strong continuing work on ODF 1.2 in OASIS, and the strong corporate, government and organizational participation demonstrated in the global ODF User Workshop recently held in Berlin, we seem to be making a disproportionate amount of noise over the hysterics of the disintegrating 3-person OpenDocument Foundation.

A number of analysts/journalists/bloggers didn’t check their facts and seem to have fallen into the trap, and ascribed a far greater importance to the actions of the Foundation. Curiously, these articles all quoted the same Microsoft Director of Corporate Standards. I hope this correlation does not prove to be a persistent contrary indicator for accuracy in future file format stories.

Luckily for us, David Berlind over at ZDNet has penetrated the confusion and gets it right:

…the future of the OpenDocument Foundation has nothing to do with the future of the OpenDocument Format. In other words, any indication by anybody that the OpenDocument Format has been vacated by its supporters is pure FUD.

11/27/2009 Update: Berlind did further research and interviews on this topic and followed up with a podcast and new blog post OpenDocument Format Community steadfast despite theatrics of now impotent ‘Foundation’ on this subject.

Q: The Open Document Foundation has a document, a “Universal Interoperability Framework” that on its title page says “Submitted to the OASIS Office Technical Committee by The OpenDocument Foundation October 16, 2007”. What is the status of this proposal in the ODF TC? (18 Nov 2007)

A: No such document has been submitted to the OASIS TC, on this date or any other date. OASIS policy states that “Contributions, as defined in the OASIS IPR Policy, shall be made by sending to the TC’s general email list either the contribution, or a notice that the contribution has been delivered to the TC’s document repository”. A look at the ODF TC’s list archive for October shows that there was no such contribution.

Q: The Foundation claims that the W3C’s CDF format has better interoperability with MS Office than ODF has. Is this true? (18 Nov 2007)

A: The Foundation’s claims have not been demonstrated, or even competently argued at a technical level that would allow expert evaluation. I cannot fully critique what is essentially vaporware. However, those who know CDF better than I do have commented on the mismatch between CDF and office documents, for example the recent interview with the W3C’s Chris Lilley in Andy Updegrove’s blog.

Q: So, does IBM then oppose CDF in favor of ODF? (18 Nov 2007)

A: No. IBM supports both the development of ODF and CDF and has a leadership role in both working groups. These are two good standards for two different things.

The W3C, over the years has produced a number of reusable, modular core standards for things like vector graphics (SVG), mathematical notation (MathML), forms (XForms), etc. To use a cooking analogy, these are like ingredients that can be combined to make a dish. ODF has taken a number of W3C standards and combined them to make a format for expressing conventional office documents, the familiar word processor, spreadsheet and presentation documents. ODF is an OASIS and ISO standard.

But just as eggs, butter and flour form the base of many recipes, the core W3C standards can be assembled in different ways for different purposes. This is a good thing.

CDF is not so much a final dish, but an intermediate step, like a roux (flour + butter) is when making a sauce. You don’t use a roux directly, but build upon it, e.g., add milk to make a béchamel, add cheese for a cheese sauce, etc., CDF itself s not directly consumable. You need to add a WICD profile, something like WICD Mobile 1.0, before you have something a user agent can process.

ODF enters the Semantic Web

2007/10/12 By Rob 11 Comments

Metadata is “data about data”. Meta from the Greek, μετά, meaning with or after. I suppose if you wanted to sound grand you could pronounce it hyper-correctly with the stress on the second syllable, met-ah’. I’ve heard some incorrectly pronounce it meet’-ah, perhaps a false analogy with βῆτα = beta. But you never hear anyone pronounce μέγα = mega as mee-guh, do you?

Metadata is not new. It has been around for centuries. In some cases metadata applies to the overall document, while in other cases it applies to only a portion of the content. Examples of the first case include titles of books, footnotes, ISBN numbers, LOC or Dewey Decimal categorizations, keywords, etc. The various forms of scribal marginalia, whether scholia or glosses in the margins of a manuscript, or personal annotations of the owner of a document, are historic examples of the second kind of metadata.

Marginal notes are frequently used today in business forms. A printed form represents, often imperfectly, a snapshot in time of an organization’s view of their own process. But maybe the process was was approximated or the form was imperfectly designed. Maybe it quickly became outdated, but somehow reality seems to outgrow the strictures of the form’s blanks and checkboxes. So what do, as a customer, do? You write notes in the margins or other places between form fields and hope that there is a human in the loop to read your words.

In any case, of all documents, forms (originally called “formulary documents”) have the most structured representation of data. Enter your social security number into the nine little boxes provided. Enter your date of birth here, Month first, then day, then two-digit year. Last name first, first name last. Everything is nice and simple, and provided your reality matches that which the form designer envisioned. Your data will be easy to consume, whether by another person or, after data entry, by various online processes. Or maybe the form data was entered online originally? Even better.

But what about all the other documents in the world, the ones that are not formally structured as forms? What sense can we make of them? Can you write a program to detect a social security number in a free-form document, or a date, or a zip code? Perhaps with pattern matching, you can find out some simple things. That is the essence of Microsoft’s Smart Tags. (And we had much of this in Lotus Agenda a decade earlier.) But this only works for the most trivial cases. It only takes you so far.

What if I wanted to markup an academic paper, a work-in-progress, to indicate which quotations have been verified and which ones remain to be be verified? Or what if I want to annotate statements in recorded testimony according to which statements contradict and which corroborate another witness’s statements? This goes far beyond pattern matching. I need a way to encode my knowledge, my view of the subject, my insights, into the document.

We have data in a document — “Words,words, words” as Hamlet tells Polonius. But for those who work with thoughts, the present constraints of encoding our knowledge as rudimentary linear strings of characters is severe. In general text is multi-layered and hyper-linked in strange and marvelous ways. Your father’s word processor and word processor file format are inadequate to the task. The concept of a document as being a single store of data that lives in a single place, entire, self-contained and complete is nearing an end. A document is a stream, a thread in space and time, connected to other documents, containing other documents, contained in other documents, in multiple layers of meaning and in multiple dimensions. What we call a traditional document is really just a snapshot in time and space, a projection into a print-ready format of what documents will soon become.

The applications of metadata to business documents are legion. Wherever you have data, you also have the questions of:

Who entered the data?
Where did the data come from?
Who verified the data?
Who approved the data? Legal? HR? Business?
Where is this data destined?
How old is the data? When does it expire?
How trustworthy is this data?
Who must we cite as an authority for this data?
Who owns this data?
Who has permissions to see this data?
Who can set policy for this data?
Who else can edit this data?
How does this data connect with my business? Is it a part number? The name of a customer or the name of an employee?

And so on.

Open Document Format (ODF) 1.2 takes a step into the word of structured metadata with an RDF metadata framework. If that sounds Greek to you, then let’s say that a metadata framework enables application developers to create applications that do the above things. A framework doesn’t tell you how you must say “This image is provided under a Creative Commons Share-Alike license” but provides a framework for application developers to express concepts like “licensed-under” and “Create Commons Share-Alike”, as well a formal structure for expressing subject-predicate-object relationships, where the subject can be any of around 50 ODF document elements, such as paragraphs, footnotes, images, tables, etc.

To read more, here are some places to start:

For general background on the “semantic web”, a good intro is 2001 Scientific American article “The Semantic Web” by Tim Berners-Lee, et. al.

For a bit more on RDF, the wikipedia page is pretty good.

Svante Schubert at Sun, also on the ODF Metadata Subcommittee has a recent blog post worth reading: “New Extensible Metadata Support With ODF 1.2.

Bruce D’Arcus, of the Metadata Subcommittee and co-lead of the OpenOffice.org Bibliographic Project also contributes his thoughts on the new ODF 1.2 metadata.

If you want to delve into the particulars of ODF 1.2’s new metadata support, you can read the latest draft of the proposed changes to the specification [ODF] and the examples [ODF] document. Of course, any feedback on ODF drafts and published standards are welcome on the ODF TC’s comment mailing list.

For a gentle introduction to metadata, ODF, where we are coming from and where we are going, I offer this interview [MP3] with Patrick Durusau, Chair of the ODF Metadata Subcommittee, which I recorded back in July.

Cracks in the Foundation

2007/10/07 By Rob 14 Comments

You must admire their tenacity. Gary Edwards, Sam Hiser, and Paul E. Merrell (aka “Marbux”) . The mythology of Silicon Valley is filled with stories of three guys and a garage founding great enterprises. And here we have three guys, and through blogs, interviews, and constant attendance at conferences, they have become some of the most-heard voices on ODF. Maybe it is partly due to the power of the name? The “OpenDocument Foundation” sounds so official. Although it has no official role in the ODF standard, this name opens doors. The ODF Alliance , the ODF Fellowship, the OASIS ODF TC, ODF Adoption TC (and many other groups without “ODF” in their name) have done far more to promote and improve ODF, yet the OpenDocument Foundation, Inc. seems to score the panel invites. Not bad for three guys without a garage.

However, in recent months the OpenDocument Foundation has found itself more and more isolated, outside of the mainstream debate. How far they have fallen can be seen in the fact that Microsoft has gone from ridiculing their conspiracy theories to using them to support their arguments. At the same time the Foundation’s membership has dwindled to the point where only a small number remain. Former members have disassociated themselves from the Foundation as it turned increasingly to strident rhetoric. Whereas in the early days, the Foundation had a large membership that participated fully in the OASIS TC’s, now their “contributions” are mainly that of heckling and haranguing the other members. Finally, the Foundation has recently announced its intent to abandon constructive work within OASIS, to actively lobby against adoption of ODF 1.2 in ISO and to push for an alternative format, CDF, based on XHTML, CSS 3.0 and RDF. This is an odd stance for a non-profit whose charter was:

The OpenDocument Foundation, Inc. is a 501c(3) non profit chartered to work in the public interest to support, promote and develop the OASIS OpenDocument File Format affectionately known as “ODf”.

So it is against this backdrop that I read with interest in Linux Today the latest correspondence from the Foundation. You can read it yourself, or take the following 8 points from me as a condensed summary of their main points:

“The commercialization of interoperability remains a key driver in both big vendor deals and big vendor consortia FOSS is left on the outside looking in.”
The conversion to XML [document formats] must be nondisruptive” meaning it fits into existing business processes which are increasingly dominated by Microsoft middleware. This implies a requirement for high-fidelity, loss-less round-trip conversions.
The alternative is “rip and replace” and that is too costly and disruptive.
Microsoft is moving toward a “grand convergence” of their services, desktop, device and servers, with OOXML at the core. “MS-OOXML is the primary transport, the document/data container of interop-integration preference.”
ODF was not designed as a response to these problems.
Microsoft/Sun/Novell are working “to limit ODF interoperability and usefulness” because of some patent deals. (Sorry I can’t summarize this one better — I just don’t understand it.)
IBM/Oracle/Google are working to “limit ODF interop” because “they want a total ripout and replace of MS Office.”
The Open Document Foundation is in “the middle area of trying to perfect the conversion to XML”.

Let me take these points one-by-one:

The OpenDocument Foundation seems to try to clothe themselves in the mantle of the open source community and pontificate on how the big bad vendors treat interoperability. But are they speaking as a non-profit or as a vendor? Take their DaVinci plugin, for example. Where is the source code? Why isn’t this open source? Are we to follow the Foundation’s claim of 100% interoperability, based on blind faith, without seeing some proof in the form of working code? I’ve been working on document conversions and document file formats of one kind or another for almost 20 years. I’ve never seen 100% fidelity conversions of anything but trivial formats. Extraordinary claims require extraordinary evidence. But we have nothing here, just white papers.
I would not claim a priori that all customers require lossless, 100% fidelity conversions. Remember, we do not see 100% fidelity even when upgrading from Office 2003 to Office 2007, but this appears to be adequate. What is required is that the total return from changing document formats exceeds any other profitable use of capital available to the enterprise. In other words, to a business this is an investment, and will be judged as an investment. Very few businesses will take a dogmatic, ideologically pure view of this. Ask yourself, would you accept 1% loss in fidelity if I gave you a billion dollars? Yes,of course you would. There are no purists in business who will remain in business. We’re just haggling over what price/fidelity combination is needed to make a prudent investment.However, there is a notable exception to this rule, and that is where access to open document formats are mandated as a public right, not as a business investment. Think of the last 20 years or so of enabling public buildings with ramps for the disabled, bathrooms to accommodate wheelchairs, braille lettering in elevators. This was done by legislation and regulation, as a matter of public policy, to ensure that all of the public has access to public facilities. There was no requirement that an access ramp post a net profit. Similarly, today we see some movements to ODF are based on open-access principles.
This is what we call the “fallacy of the excluded middle.” You are either with us, or against us, etc. It is false to suggest that the only two approaches to interoperability are to either blindly follow the OpenDocument Foundation’s mysterious DaVinci plugin, or to ignore interoperability altogether and advocate rip and replace. There are today two other other ODF plugins available, one from Microsoft and one from Sun. This is real, running code, open source even in the case of the first plugin. So why should we be taking exclusive direction from the Foundation on how we achieve interoperability? Oh right, they are claiming that their program achieves 100% round-trip fidelity. Extraordinary claims…
Gary is in the ballpark when he suspects that Microsoft is seeking some sort of “grand convergence” around protocols and formats. However, I disagree with his impression that OOXML sits at the center of this. In my opinion, OOXML is a rushed, transitional format, intended purely to disrupt ODF adoption. Just as the Office 2000, Office XP, and Office 2003 markup formats were abandoned by Microsoft, I predict that OOXML will soon be cast aside. The problem is that OOXML is such a poorly-engineered format that not even Microsoft wants to build upon this. If I had to divine the future of Microsoft’s file formats, I’d look more in the XAML/XPS/Silverlight space. I believe that future MS Office document formats will look more like that than like OOXML.
I find this observation amusing. ODF, which started its standards track late in 2002, was not designed to be 100% compatible with Office 2007. Mercy me, how did we manage to drop the ball on this one?! Remember, in 2002 there was no publicly available specification for Microsoft document formats. There was no Open Specification Promise or Covenant Not to Sue. So not only was 100% compatibility technically impossible, attempting it via reverse engineering was precarious from a legal standpoint. In my opinion, it still is, even in 2007.In any case I’m staunchly opposed to evolving any open standard purely for the benefit of a single vendor. Microsoft Internet Explorer is the dominate web browser. Should we then require that HTML only evolve in ways that improve interoperability with Internet Explorer? I don’t think so. Why should document formats be different?
This comment manages to avoid confronting a heap of contrary facts. Microsoft supports the open source ODF Translator project on SourceForge. Sun has made their own ODF Plugin 1.1 for MS Office available for download. And Novell, along with helping the Microsoft effort, has integrated that translator into their version of OpenOffice and has also started work on more powerful, next-generation support for OOXML. So these three companies are seeking to “limit ODF interoperability and usefulness”? If so, they sure have a clever way of disguising their intent. To the ordinary bystander, writing conversion and translation code to allow documents to be shared between OpenOffice and MS Office would be seen as a pro-interoperability statement. But thanks to the OpenDocument Foundation’s in-depth sleuthing, we now know that the opposite is true. Not!Although I have serious doubts as to long-term technical feasibility of some of these translation endeavors, they do have the advantage of showing real, running code working with real, running applications. They may not claim 100% fidelity, but this is first-generation work and will undoubtedly improve. But they have an important advantage over the Foundation’s DaVinci Plugin in that these other efforts demonstrably exist. Given a choice, I’ll take an open source version of a partial fidelity convertor, with a reasonable architecture, over one that claims 100% fidelity, but that I can’t see or touch.
The claim is that IBM/Google/Oracle also want to “limit ODF interop” because (according to Gary) we want rip & replace. Strange, but just a few weeks ago I lead an ODF Interoperability Camp in Barcelona, on behalf of the OASIS ODF Adoption TC, where we had a good selection of ODF vendors, open source projects and customers working to improve interoperability, including Sun, Novell, Google and IBM. The OpenDocument Foundation is a member of the OASIS ODF Adoption TC. So did they help in the organizing of the event? Did they participate? No, nothing, nada. Evidently it is easier to complain about interoperability than to do something about it.And again there is this fallacy of the excluded middle. You must either accept the magical DaVinci Plugin, or you are for rip & replace. There are no other alternatives considered. I’d remind the OpenDocument Foundation that interoperability was not invented yesterday, and that there are many technical approaches that can be applied to foster it. Open standards are one way, but there are others that can be applied as well, including conformance testing, test suites, plug-fests, profiles, shared code, reference implementations, etc. We should apply experience and engineering judgment to select the appropriate solution for the problem, and not fall into the trap of believing that there is only a single path to interoperability, and that this path just happens to be based on the Foundation’s product.
Although it sure would be nice to portray yourself as the little guy, watching out for the customer, while the big bad vendors tromp all over the flowers, the fact is that the big vendors are actively working on interoperability, with at least three major solutions available today, as well a major initiative around interoperability in the ODF Adoption TC. In particular, IBM (with SmartSuite) and Sun (with StarOffice) have 15 or so years experience each in working on document interoperability with MS Office. This isn’t rocket science, but neither is it easy. You can either stand on the sidelines and make pronouncements about how the world is out to prevent interoperability, or you can roll up your sleeves and help get the work done. I know which one I’ll be doing. What about you?

If the Foundation’s approach was technically feasible, they would just go out and do it. You don’t let a breakthrough technical innovation wait on a standards committee to act. You just go out and do it and then standardize it later, once you’ve proven it works. If the Foundation really thinks that they can achieve 100% interoperability with MS Office with just 5 simple changes to ODF, then why the heck don’t they just do it? Don’t wait for the formality of an the ODF TC ‘s approval. They should go ahead, as if the standard already had their 5 fixes, and show the world how they have achieved 100% interoperability with MS Office. If they are right, they would all become multi-millionaires in a very short period of time.