• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for Rob

Rob

Genesis 11:5-9

2006/11/14 By Rob 5 Comments

This, fresh from from Office Watch: “Office 2007 compatibility pack disappoints”.

Update 11/15: Some readers have written with more information. This may be an issue between the pre-1.5-final-draft version of OOXML and the final RTM Compatibility Pack. Evidently there were some late changes to the OOXML specification, including a change in namespace URI’s. So the problems seem to be between documents created in the beta version of Office 2007 (not sure whether all beta’s including the Technical Refresh) and the RTM version of Office. Confusing to say the least. It looks like the referenced article is being updated with additional details.

Update 11/7: The cited article updated again. This seems to be an issue related to what patch level you are running. If you have all of the updates applied to Windows/Office, the Compatibility Pack works as advertised.

Since there are a number of convertor initiatives under development, it is probably worth backing up and taking a survey of where we stand today:

ODF = Open Document Format, an XML-based document format used in products like IBM Workplace, the next version of Lotus Notes, OpenOffice.org, KOffice, AbiWord, GNUmeric, etc. ODF is an ISO standard and is maintained at OASIS.

OOXML = Office Open XML, an XML-based format which will be used in Microsoft Office 2007 when it is released in January. OOXML is currently a draft specification in Ecma, though it will certainly be adopted as an Ecma standard in December.

The Legacy Formats = the proprietary binary formats that Microsoft used before Office 2007, the familiar DOC, XLS and PPT files.

So, what can be converted to what, using what, and does it really work?

If you upgrade to Office 2007 when it comes out, you will be able to read and write both the OOXML and the Legacy formats. Both are supported out-of-the-box.

If you want to stay on an older version of Office, and need to exchange documents with someone using the new OOXML formats, then you need Microsoft’s Compatibility Pack. As the above article points out, getting this to work in practice requires first ensuring that your patch level is current.

What about ODF? If you are on Microsoft Office, then there are two initiatives underway to bring ODF support to Office. One is the Microsoft-supported (and now Novell as well) odf-convertor project on SourceForge. Their initial deliverable will be the “ODF Add-in For Microsoft Word”. I didn’t have all that much luck with an earlier “alpha” version of the Add-in, but I’ve heard it is much improved. However, in the near term it only supports reading ODF text documents. No support for writing, and no support for presentations or spreadsheets. These other features are slated to be delivered in future phases of the project. The Open Document Foundation is also developing a convertor, which they call the “ODF Plugin”. Sam Hiser will be presenting on it at XML 2006 in Boston, so hopefully we’ll learn more about it then.

If you are running OpenOffice.org, then you already have excellent integrated conversion support between ODF and the Legacy Office formats. But if you need to exchange documents with someone using Office 2007 and its default OOXML formats then you are out of luck for now. However, please note that the recent Novell/Microsoft agreement included a statement (if I’m reading this correctly) that Novell would help add OOXML support to OpenOffice.org. So this support should eventually make it into OpenOffice.org.

So, based on what really works today, I’d offer this recommendation: If you must upgrade to Office 2007 , then turn the default file formats to be the Legacy binary formats. Until the OOXML convertors mature and all Office users have migrated off the beta and have compatible OOXML versions, you’ll only be causing chaos with those you exchange documents with if you save as OOXML.

Filed Under: ODF, Office, OOXML

Two simple questions

2006/11/06 By Rob 20 Comments

Some pertinent quotes from Microsoft’s Brian Jones, thematic quotes made over a sustained period of time:

  • “The Open XML formats were designed to be 100% backward compatible with the existing set of Office binary formats, and that was really a goal that we can’t compromise on.”
  • “It needs to be 100% full fidelity”
  • “[F]rom our point of view, in order to use an XML format as the *default* format for Office it needs to be 100% compatible”
  • “We need to make sure that the format is documented 100% and there are no barrier to interoperability”
  • “This format is 100% compatible with the existing base of Microsoft Office documents, so nobody will need to worry about losing features”

Get the idea?

Now these quotes were all made before OOXML was completed. I understand engineering and deadlines and such, and that things don’t always all get done as planned. But I would like to know, now that we have 1.5 OOXML “final draft”, and Office 2007 has released to shipping, is it indeed indeed indeed 100% backwards compatible.

Two simple questions. I’m hoping Microsoft or Ecma can give a straightforward and unequivocal answer:

1) Is the Office Open XML specification (1.5 “final draft”) 100% compatible with all legacy Microsoft Office documents, meaning that a 3rd party, using solely information in this specification (and publicly available open standards), can create a utility on a non-Windows platform, say Linux, to convert any legacy Office document into OOXML without loss of data, function or appearance?

2) Does the OOXML specification (1.5 “final draft”) document the format sufficiently for someone to create a 100% compatible editor (spreadsheet, word processor, presentation) implementation on a non-Windows platform, say Linux? By 100% compatible I mean that it can load and interpret and display all OOXML documents without loss of data, function or appearance?

I note that everything we’ve heard up to now merely says that OOXML was designed to be 100% compatible. But I’d like to hear whether it in fact succeeded at doing these things. That’s the important question, right? We can talk intent all we want, but the results are what counts.

I believe that the criterion should be whether a 3rd party can create a conversion tool and editor based on the documented format. That fact that Office itself may do a conversion is not proof of anything. They could submit a specification both incomplete and erroneous but still do a good conversion job in Office based on private information. The proof of sufficiency for the specification only comes with independent 3rd party implementations.

These are simple questions. I’m hoping for a simple answer.

Filed Under: OOXML

Unlocking the Wordhord

2006/11/01 By Rob Leave a Comment

I have a backlog of shorter items that I’ve accumulated in recent weeks that I’d like to share with you. I hope you find something here interesting.

First, congratulations to OpenOffice.org and KOffice, who both recently announced new releases. In my mind the notable features include an improved extensions framework in OpenOffice 2.04 and leading MathML conformance scores and command-line (UI-less) scripting for KOffice 1.6. Combined with the recent release of Firefox 2.0, it feels like Christmas has come early this year!

I get the feeling that there are more good things to come. Eike Rathke blogs about order of magnitude performance improvements in load time for large spreadsheets, a fix targeted for OpenOffice.org 2.1.

Some emerging technology at Adobe, a project codenamed “Mars”, which appears to be a reformulation of PDF, based on open standards such as SVG, PNG, JPG, JPG2000, OpenType, XPath and XML, all sitting in a Zip container file. There is a voice in my head saying, “This is important”. For example, could we have a single container file that included both ODF editable content as well as Mars/PDF for high-fidelity presentation? That way you can hand a document to someone and they can either view/edit it in a full heavy-weight editor, or get a fast high-fidelity read-only rendering. Both modes of use from the same file. To make this, and other cool things happen, Mars and ODF will want to synch-up on things like packaging, manifests and metadata. Adobe, call me ;-)

Two new ODF whitepapers to note. J. David Eisenberg looks at ODF and XForms and how they work together in OpenOffice.org, using a wrestling club application form as an example. Of course, source code is included. “Opportunities for innovation with OpenDocument Format XML” is the title of a new IBM whitepaper also just posted.

A couple weeks ago I participated in a roundtable discussion on ODF at the Berkman Center at Harvard Law School, held by the TransAtlantic Consumer Dialogue forum. You’ve probably already read Jame’s Love’s post on it on The Huffington Post. If not, take a look. Since I tend to spend my days with two kinds of people, the technical and the very technical, it was good to get out and hear a different perspective on the issues.

A familiar face at the Berkman Center was Sam Hiser, who has a new post, at once both visceral and witty, called “Pretending Interoperability”.

Finally, in order to increase the signal-to-noise ratio in this blog, I’ve instituted a new comment policy. Those comments which are outside of the prescribed bounds will not be published.

Filed Under: ODF, Open Source Tagged With: Berkman Center, Eike Rathke, J. David Eisenberg, James Love, OpenOffice, Sam Hiser

Ass-backwards Compatibility

2006/10/27 By Rob Leave a Comment

I refer you to Ben Langhinrichs at Genii Software with his post on “Self deprecating standards”.

After you’ve read that, then I hope you’ll return back here for some additional thoughts.


From what Ben writes, it seems that one source of OOXML’s length and complexity is that it had grandfathered in all sorts of special cases and exceptions for older versions of Office. So an implementor not only must worry about how Office 2007 does footnote placement, but also needs to worry about how Word 6 and Word 95 did it differently. Many other examples in his post. (Ben, you should take a look at the VML section — that entire markup is deprecated in the spec. )

We call this “crud”, the accumulated debris from old implementations, old platforms, old formats, old mistakes. Don’t get me wrong. As a forensic exercise in documenting the byzantine complexity of this arcane format the draft OOXML specification is a considerable achievement. The 6,000 pages testify to the industry of its authors as well as the thanklessness of their task.

But we need to keep in mind that there is a difference between a specification and a standard. A specification tells the plain facts of what a particular technology does without comment as to whether it is good or bad. I could drop a box of toothpicks on my desk and write up a detailed specification on how they landed. An open standard, on the other hand, goes beyond mere specification, and promotes a preferred way of achieving cross-vendor and cross-application interoperability. Ideally a standard says, “This specification is good, we thought of the wider needs of the market, including other vendors and the consumer, and if we all implement this technically elegant specification then we all win.”

However one thinks of the OOXML specification, it lacks the quality, the consideration and the perspective of an open standard. It was doomed from day 1, when its charter limited TC45 to not making any changes that would be incompatible with Microsoft’s existing proprietary binary formats. Think of it this way — The binary formats were certainly never designed to be a standard, right? They were developed in-house at Microsoft, kept from public scrutiny for 15 years, never brought before a standards organization, licensed on terms that prevented competition with Office, etc. Now, 15 years later they take the accumulated crud from 15 years of a proprietary binary format, convert it into XML and expect to call that an “open standard”?

From an architectural and design perspective this is all ass-backwards. You can build complexity on top of simplicity, but you can’t easily build simplicity on top of complexity. The recommended approach when designing a standard-aspiring document format is to survey the range of functionality in existing and anticipated implementations, map out the intersection of this complexity and then generalize it. The generalization of complexity often leads to simplification. A good example is the art page borders, where the OOXML specification enumerates hundreds of specific images that can be used in page borders, an approach which is at once limiting and complex. The generalization of having the document include the clipart would both simplify the specification as well as increase its functionality. Instead of this approach, Microsoft has squandered their opportunity to improve document-centric computing and interoperability and instead produced 6,000 pages of indigestible complexity, an approach that cuts off alternate implementations and props up their monopoly.

It is time to move on.

Filed Under: OOXML

The Chernobyl Design Pattern

2006/10/26 By Rob 14 Comments

In 1994, the world learned that the Intel Pentium chip had a bug. In certain cases it gave the wrong answer when calculating floating-point division. These cases were rare, only 1 in 9 billion divisions, and typically only resulted in errors past the 8th decimal place.

What did Intel do about this? Well, there was denial at first, and then dismissal of the problem as being trivial and unimportant. But eventually they saw the light and offered a no-questions-asked replacement policy for defective processors. No doubt this was expensive for Intel, but this preserved their good name and reputation.

It could have been different. For example, they could have simply kept the bug. They could have preserved that bug in future versions of the Pentium for backwards compatibility, arguing that there was some software out there that may have worked around the original defect, and for Intel to fix the bug now would only break the software that worked around the bug. This is a dangerous line of reasoning. What bug can’t be excused by that argument?

Intel could have further decided to turn their bug into a standard, and get it blessed by a standards development organization and maybe even ISO. “It’s not a bug, it’s a standard”.

But Intel is not Microsoft, so they don’t have quite the audacity to turn a bug into a standard, which is what Microsoft is attempting to do by declaring in Office Open XML (OOXML) that the the year 1900 should be treated as a leap year, in contradiction of the Gregorian Calendar which has been in use almost 500 years. (Years divisible by 100 are leap years only if they are also divisible by 400)

By mandating the perpetuation of this bug, we are asking for trouble. Date libraries in modern programming languages like C, C++, Java, Python, Ruby all calculate dates correctly according to the Gregorian Calendar. So any interpretation of dates in OOXML files in these languages will be off by one day unless the author of the software adds their own workaround to their code to account for Excel’s bug. Certainly some will make the “correction” properly, at their own expense. But many will not, perhaps because they did not see it deep within the 6,000 page specification.

There is something I call the “Chernobyl Design Pattern”, where you take your worst bug, the ugliest part of your code, the part that is so bad, so radioactive that no one can touch it without getting killed, and you make it private and inaccessible, and then put a new interface around it, essentially entomb it in concrete so that no one can get close to it. In other words, if you can’t fix it, at least contain the damage, prevent it from spreading.

Microsoft has taken another approach here. Instead of containment, they are propagating the bug even further. We need to think beyond Excel and think as well of other applications that work with OOXML data, and other applications that work with those apps and so on, the entire network of data dependencies. The mere existence of this bug in a standard will lead to buggy implementations, poor interoperability, and general chaos around dates. The fallout of this bug should have been contained within the source code of Excel. For this to leak out, into a specification, then a standard and then into other implementations, contradicting both the civil calendar and every other tool that deals with dates, will pollute the entire ecosystem.

This is bad news. Just say no.

Filed Under: OOXML, Popular Posts

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 60
  • Page 61
  • Page 62
  • Page 63
  • Page 64
  • Interim pages omitted …
  • Page 69
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies