• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for OOXML

OOXML

The Formats of Excel 2007

2007/01/08 By Rob 29 Comments

I’ve installed the new Office 2007. This isn’t my preferred platform. In fact I find I’m not using heavy-weight editors of any variety much. For every page I compose in a dedicated word processor I author perhaps 50 pages in emails, blogs or wiki’s. However, since I do have a license for Office 2007, and I am curious, I decided to take it for a spin. If you want to be a film critic, you’ve got to see the movies…

Here is a quick survey of what I saw in Excel 2007, concentrating on the file format support, my particular area of interest.

First, let’s look at the “Save As” dialog. As you can see from this screen capture, we have some new options:

The Default

The first choice saves in the default format. This is configurable under “Excel Options”, but by default this saves in the new Office Open XML (OOXML) format, with an “xlsx” file extension.

With Macros

The “Excel Macro-Enabled Workbook” option saves as an “xlsxm” extension. It is OOXML plus proprietary Microsoft extensions. These extensions, in the form of binary blob called vbaProject.bin, represent the source code of the macros. This part of the format is not described in the OOXML specification. It does not appear to be a compiled version of the macro. I could reload the document in Excel and restore the original text of my macro, including whitespace and comments. So source code appears to be stored, but in an opaque format that defied my attempts at deciphering it.

(What’s so hard about storing a macro, guys? It’s frickin’ text. How could you you screw it up? )

This has some interesting consequences. It is effectively a container for source code that not only requires Office to run it, but requires Office to even read it. So you could have your intellectual property in the form of extensive macros that you have written, and if Microsoft one day decides that your copy of Office is not “genuine” you could effectively be locked out of your own source code.

New Style Binary

The “Excel Binary Workbook” option caught me by surprise. This is not the legacy binary formats. This is not the new OOXML. This is a new binary format, with an “xlsb” extension. Similar to OOXML it has a Zip container file (the so-called Open Packaging Conventions container file format), but the payload consists (aside from a manifest) entirely binary files.

I can’t tell if they are some proprietary binary mapping of the OOXML XML, or whether this is an entirely new binary format unrelated to the XML format. In any case this format is entirely undocumented and is unreadable to anyone by Microsoft.

It is also interesting that Microsoft is positioning this format as the preferred one for performance and interoperability. The online help for Excel 2007 says:

In addition to the new XML-based file formats, Office Excel 2007 also introduces a binary version of the segmented compressed file format for large or complex workbooks. This file format, the Office Excel 2007 Binary (or BIFF12) file format (.xls), can be used for optimal performance and backward compatibility.

Old Style Binary

The Excel 97-2003 option provides the legacy binary “xls” formats, the familiar BIFF format from earlier versions of Office.

Find add-ins

This takes you to a page where you can download the “Microsoft Save as PDF or XPS” Add-in. Note that you are prompted to download an Add-in that provides support for both PDF and XPS. But if you hunt around a bit you can find another page where you can download just one format or the other, which is what I did, installing just the PDF support. This added a new option, “PDF” to the Save As dialog.

Other Formats

This brings up a dialog where you can choose from the previously mentioned formats as well as the several legacy export formats, including:

  • XML Data
  • Web Page
  • Text
  • Unicode Text
  • XML Spreadsheet 2003
  • Excel 5.0/95 Workbook
  • CSV
  • Formatted Text
  • DIF
  • SYLK

Summary

My overall impression was soured a bit by the large number of crashes I experienced. Indeed Excel crashed on exit on almost every session. This was dozens of crashes over the course of an afternoon. This will need to be fixed before I would trust it with my data.

Another curiosity was a legacy binary document that gave the following error message whenever I tried to save it to the new OOXML format:

It did not get this message when I saved it back to the binary format. So evidently I’m losing something when moving to OOXML, whatever “Line Print settings” are. So much for the claims of 100% backwards compatibility…

My examination also put to rest any lingering hope I had that Microsoft had fundamentally changed their position on proprietary file formats and has decided to follow in the paths of openness. The new proprietary binary format and the undocumented ways that macros are encoded put any hope of that to rest.


1/22/07, A quick update: Microsoft’s Doug Mahugh helped track down and fix the crash problem I had earlier reported when exiting Excel. This is a bug in the”Send to Bluetooth” COM Add-in that Excel was loading at startup. After disabling that Add-in, I’m no longer crashing.

Filed Under: Microsoft, OOXML

How to hire Guillaume Portes

2007/01/03 By Rob 65 Comments

You want to hire a new programmer and you have the perfect candidate in mind, your old college roommate, Guillaume Portes. Unfortunately you can’t just go out and offer him the job. That would get you in trouble with your corporate HR policies which require that you first create a job description, advertise the position, interview and rate candidates and choose the most qualified person. So much paperwork! But you really want Guillaume and only Guillaume.

So what can you do?

The solution is simple. Create a job description that is written specifically to your friend’s background and skills. The more specific and longer you make the job description, the fewer candidates will be eligible. Ideally you would write a job description that no one else in the world could possibly match. Don’t describe the job requirements. Describe the person you want. That’s the trick.

So you end up with something like this:

  • 5 years experience with Java, J2EE and web development, PHP, XSLT
  • Fluency in French and Corsican
  • Experience with the Llama farming industry
  • Mole on left shoulder
  • Sister named Bridgette

Although this technique may be familiar, in practice it is usually not taken to this extreme. Corporate policies, employment law and common sense usually prevent one from making entirely irrational hiring decisions or discriminating against other applicants for things unrelated to the legitimate requirements of the job.

But evidently in the realm of standards there are no practical limits to the application of this technique. It is quite possible to write a standard that allows only a single implementation. By focusing entirely on the capabilities of a single application and documenting it in infuriatingly useless detail, you can easily create a “Standard of One”.

Of course, this begs the question of what is essential and what is not. This really needs to be determined by domain analysis, requirements gathering and consensus building. Let’s just say that anyone who says that a single existing implementation is all one needs to look at is missing the point. The art of specification is to generalize and simplify. Generalizing allows you to do more with less, meeting more needs with fewer constraints.

Let’s take a simplified example. You are writing a specification for a file format for a very simple drawing program, ShapeMaster 2007. It can draw circles and squares, and they can have solid or dashed lines. That’s all it does. Let’s consider two different ways of specifying a file format.

In the first case, we’ll simply dump out what ShapeMaster does in the most literal way possible. Since it allows only two possible shapes and only two possible line styles, and we’re not considering any other use, the file format will look like this:

<document>
<shape iscircle="true" isdotted="false"/>
<shape iscircle="false" isdotted="true"/>
</document>

Although this format is very specific and very accurate, it lacks generality, extensibility and flexibility. Although it may be useful for ShapeMaster 2007, it will hardly be useful for anyone else, unless they merely want to create data for ShapeMaster 2007. It is not a portable, cross-application, open format. It is a narrowly-defined, single application format. It may be in XML. It may even be reviewed by a standards committee. But it is by its nature, closed and inflexible.

How could this have been done in a way which works for ShapeMaster 2007 but also is more flexible, extensible and considerate of the needs of different applications? One possibility is to generalize and simplify:

<document>
<shape type="circle" lineStyle="solid"/>
<shape type="square" lineStyle="dotted"/>
</document>

Rather than hard-code the specific behavior of ShapeMaster, generalize it. Make the required specific behavior be a special case of something more general. In this way we solve the requirements of ShapeMaster 2007, but also accommodate the needs of other applications, such as OpenShape, ShapePerfect and others. For example, it can easily accommodate additional shapes and line styles:

<document>
<shape type="circle" lineStyle="solid"/>
<shape type="square" lineStyle="dotted"/>
<shape type="triangle" lineStyle="dashed"/>
</document>

This is a running criticism I have of Microsoft’s Office Open XML (OOXML). It has been narrowly crafted to accommodate a single vendor’s applications. Its extreme length (over 6,000 pages) stems from it having detailed every wart of MS Office in an inextensible, inflexible manner. This is not a specification; this is a DNA sequence.

The ShapeMaster example given above is very similar to how OOXML handles “Art Page Borders” in a tedious, inflexible way, where a more general solution would have been both more flexible, but also far easier to specify and implement. I’ve written on this in more detail elsewhere.

Here are some other examples of where the OOXML “Standard” has bloated its specification with features that no one but Microsoft will be able to interpret:

2.15.3.6 autoSpaceLikeWord95 (Emulate Word 95 Full-Width Character Spacing)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 95) when determining the spacing between full-width East Asian characters in a document’s content.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

(This example and the following examples brought to my attention by this post from Ben at Genii.)

What should we make of that? Not only must an interoperable OOXML application support Word 12’s style of spacing, but it must also support a different way of doing it in Word 95. And by the way, Microsoft is not going to tell you how it was done in Word 95, even though they are the only ones in a position to do so.

Similarly, we have:

2.15.3.26 footnoteLayoutLikeWW8 (Emulate Word 6.x/95/97 Footnote Placement)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 6.x/95/97) when determining the placement of the contents of footnotes relative to the page on which the footnote reference occurs. This emulation typically involves some and/or all of the footnote being inappropriately placed on the page following the footnote reference.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

Again, in order to support OOXML fully, and provide support for all those legacy documents, we need to divine the behavior of exactly how Word 6.x “inappropriately” placed footnotes. The “Standard” is no help in telling us how to do this. In fact it recommends that we don’t even try. However, Microsoft continues to claim that the benefit of OOXML and the reason why it deserves ISO approval is that it is the only format that is 100% backwards compatible with the billions of legacy documents. But how can this be true if the specification merely enumerates compatibility attributes like this without defining them ? Does the specification really specify what it claims to specify?

The fact that this and other legacy features are dismissed in the specification as “deprecated” is no defense. If a document contains this element, what is a consuming application to do? If you ignore it, the document will not be formatted correctly. It is that simple. Deprecated doesn’t mean “not important” or “ignorable”. It just means that new documents authored in Office 2007 will not have it. But billions of legacy documents, when converted to OOXML format, may very well have them. How well will a competing word processor do in the market if it cannot handle these legacy tags?

So I’d argue that these legacy tags are some of the most important ones in the specification. But they remain undefined, and by this ruse Microsoft has arranged things so that their lock on legacy documents extends to even when those legacy documents are converted to OOXML. We are ruled by the dead hand of the past.

Let’s go back even further in time to Word 5.0:

2.15.3.32 mwSmallCaps (Emulate Word 5.x for the Macintosh Small Caps Formatting)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 5.x for the Macintosh) when determining the resulting formatting when the smallCaps element (§2.3.2.31) is applied to runs of text within this WordprocessingML document. This emulation typically results in small caps which are smaller than typical small caps at most font sizes.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

You’ll need to take my word for it that “This emulation typically results in small caps which are smaller than typical small caps at most font sizes” falls well short of the level of specificity and determinism that is typical of ISO specifications.

Further:

2.15.3.51 suppressTopSpacingWP (Emulate WordPerfect 5.x Line Spacing)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (WordPerfect 5.x) when determining the resulting spacing between lines in a paragraph using the spacing element (§2.3.1.33). This emulation typically results in line spacing which is reduced from its normal size.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

So not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect. Good luck.

My tolerance for cutting and pasting examples goes only so far, so suffice it for me to merely list some other examples of this pattern:

  • lineWrapLikeWord6 (Emulate Word 6.0 Line Wrapping for East Asian Text)
  • mwSmallCaps (Emulate Word 5.x for Macintosh Small Caps Formatting)
  • shapeLayoutLikeWW8 (Emulate Word 97 Text Wrapping Around Floating Objects)
  • truncateFontHeightsLikeWP6 (Emulate WordPerfect 6.x Font Height Calculation)
  • useWord2002TableStyleRules (Emulate Word 2002 Table Style Rules)
  • useWord97LineBreakRules (Emulate Word 97 East Asian Line Breaking)
  • wpJustification (Emulate WordPerfect 6.x Paragraph Justification)
  • shapeLayoutLikeWW8 (Emulate Word 97 Text Wrapping Around Floating Objects)

This is the way to craft a job description so you hire only the person you earmarked in advance. With requirements like the above, no others need apply.

As I’ve stated before, if this were just a Microsoft specification that they put up on MSDN for their customers to use, this would be par for the course, and not worth my attention. But this is different. Microsoft has started calling this a Standard, and has submitted this format to ISO for approval as an International Standard. It must be judged by those greater expectations.


Update:

1/14/2007 — This post was featured on Slashdot on 1/4/07 where you can go for additional comments and debate. I’ve summarized the comments and provided some additional analysis here.

2/16/2007 — fixed some typo’s, tightened up some of the phrases.

Filed Under: OOXML, Popular Posts

A notable achievement

2006/12/09 By Rob 8 Comments

I believe congratulations are in order to Microsoft and Ecma’s TC45 for what appears to be a new world record for creating a standard. Their recently-approved Office Open XML (OOXML) standard weighed in at 6,456 pages yet took only 357 days to be reviewed, edited and approved, making it not only the largest markup specification, but possibly also the fastest to complete its standardization cycle.

To put the magnitude of this accomplishment into perspective, I looked at a variety of other successful standards from various standards bodies, such as:

  • OASIS OpenDocument Format (ODF)
  • OASIS Darwin Information Typing Architecture (DITA)
  • W3C Extensible Stylesheet Language (XSL)
  • W3C XHTML
  • W3C Scalable Vector Language (SVG)
  • W3C Simple Object Access Protocol (SOAP)
  • IETF MIME
  • Ecma C#
  • Ecma C++/CLI

In all cases I looked at how long the specification took to be standardized, from when the initial draft was made available (whether developed within the technical committee, or submitted by a vendor at committee formation) to the time when the standard was approved. So we’re looking at the complete editing/review/approval time, not including the time to author the initial draft. I also looked at the length of the resulting standard.

(Click on the above chart for a larger view)

As you can see, there is a noticeable trend with previous standards, where longer specifications took longer to edit, review and approve than shorter ones. This was the received wisdom, that standardization was a slow process, and this deliberate pace was necessary not only to achieve technical excellence, but also to socialize the specification and build industry consensus.

Also, previous specifications seemed to top out at around 1,000 pages. Larger than that and they tended to be broken into individual sub-standards which were reviewed and approved individually.

The general practice, as shown in this data, has been for standards to take from 0.1 – 1.2 pages per day, for a complete review/edit/approval cycle. Even other Microsoft specifications in Ecma have fit within these parameters, such as C# (1.2 pages/day) and C++/CLI (0.7 pages/day).

Thus the remarkable achievement of Microsoft and Ecma TC45, who not only managed to create a standard an order of magnitude larger than any other markup standard I’ve seen, but at the same time managed to complete the review/edit/approve cycle faster than any other markup standard I’ve seen. They have achieved an unprecedented review/edit/approval rate of 18.3 pages/day, 20-times faster than industry practice, a record which will likely stand unchallenged for the ages.

I think we would all like to know how they did it. High-altitude training? Performance enhancing drugs? Time travel? A pact with the Devil? I believe you will all share with me an earnest plea for them to share the secret of their productivity and efficiency with the world and especially with ISO, who will surely need similar performance enhancements in order for them to review this behemoth of a standard within their “Fast Track” process.

I am optimistic, that once the secret of OOXML’s achievement gets out, the way we make standards will never be the same again.


Change Log

1/26/07 — corrected two typographical errors pointed out by a reader

Filed Under: OOXML, Standards

The worm in the apple

2006/12/05 By Rob 3 Comments

Via CrunchGear, MacWorld UK, and APC Magazine — Mac Office users seem to have no way of reading the new OOXML files which Office 2007 for Windows writes by default. APC quotes a Microsoft Mac Business Unit spokeperson as saying, “Unfortunately it is still to early for us to say when the converters will be available”.

Whoops.

As a public service I note two alternatives: the Mac port in OpenOffice.org and NeoOffice.


9 December 2006 Update: Interesting analysis from from Andrew Shebanow over at Shebanation: How adding OOXML support to the Mac is likely 150 person-year effort. And Mary Jo Foley’s Unblinking Eye points out that the problem is not just with the Mac support. Windows Mobile 5.0 will lack OOXML support until mid-2007.

Is it just me, or does this seem like something less than a coordinated roll-out? The clean, hassle-free way of doing this, with the least suffering for users and admins, would have been like this: Ship Office 2007 with OOXML support, but not as the default. Then over the next year get the rest of the Office ecosystem working with OOXML: the Mac, Mobile, Sharepoint, Excel Live, etc. Get all of the support out there, but don’t force it on people yet as a default. When all the pieces are ready then, via a service pack or version upgrade, change the defaults. Everything goes smoothly from there.

The fact that they didn’t follow this roll-out model suggests that someone at Microsoft really, really, really wanted to get OOXML out fast, even if it wasn’t pretty.

Luckily admins do have the ability to perform a more orderly roll-out in their organizations if they wish. The default format for Office applications can be changed via a registry entry. For example, for Excel the registry entry is:

HKCU\Software\Microsoft\Office\12.0\Excel\Options\DefaultFormat

By default it isn’t there, but you can create an entry of type REG_DWORD and assign it the value of 56 (38 hexadecimal). Once you’ve made that change, Excel documents will be saved in the legacy binary formats by default. Similar registry settings for Word and PowerPoint are:

HKCU\Software\Microsoft\Office\12.0\Word\Options\DefaultFormat

create REG_SZ with value of “Doc”

and

HKCU\Software\Microsoft\Office\12.0\PowerPoint\Options\DefaultFormat

create REG_DWORD with value of 0

It should be trivial for someone with a Windows compiler to create a simple application to accomplish this same task. Ideally it would also allow the default to be changed to any other format of the admin’s choice, including turning it back to OOXML if/when admins desire to deploy that way, or changing it to ODF when a good Plugin is available.

10 December 2006 Update: My attention has been drawn to an earlier post from a lead in Microsoft’s Mac Business Unit, where the removal of support for Visual Basic macros is discussed. Damn, that’s cold. Ever get the feeling you’ve been marked for extermination?

17 May 2007 Update: From News.com “Microsoft delays Office convertors for Mac” and some great follow-up analysis by Andrew Shebanow over at Shebanation.

30 May 2007 Update: More analysis and commentary on this ongoing issue from Joe Wilcox over at Microsoft Watch:

Meanwhile, Microsoft makes big noise about interoperability. What kind of example does Microsoft set when the formats for its Mac and Windows Office suites aren’t interoperable? Irreconcilable is the position of increased Microsoft-and-other platform interoperability and the decreased interoperability between Office file formats across two platforms.

Filed Under: OOXML

Genesis 11:5-9

2006/11/14 By Rob 5 Comments

This, fresh from from Office Watch: “Office 2007 compatibility pack disappoints”.

Update 11/15: Some readers have written with more information. This may be an issue between the pre-1.5-final-draft version of OOXML and the final RTM Compatibility Pack. Evidently there were some late changes to the OOXML specification, including a change in namespace URI’s. So the problems seem to be between documents created in the beta version of Office 2007 (not sure whether all beta’s including the Technical Refresh) and the RTM version of Office. Confusing to say the least. It looks like the referenced article is being updated with additional details.

Update 11/7: The cited article updated again. This seems to be an issue related to what patch level you are running. If you have all of the updates applied to Windows/Office, the Compatibility Pack works as advertised.

Since there are a number of convertor initiatives under development, it is probably worth backing up and taking a survey of where we stand today:

ODF = Open Document Format, an XML-based document format used in products like IBM Workplace, the next version of Lotus Notes, OpenOffice.org, KOffice, AbiWord, GNUmeric, etc. ODF is an ISO standard and is maintained at OASIS.

OOXML = Office Open XML, an XML-based format which will be used in Microsoft Office 2007 when it is released in January. OOXML is currently a draft specification in Ecma, though it will certainly be adopted as an Ecma standard in December.

The Legacy Formats = the proprietary binary formats that Microsoft used before Office 2007, the familiar DOC, XLS and PPT files.

So, what can be converted to what, using what, and does it really work?

If you upgrade to Office 2007 when it comes out, you will be able to read and write both the OOXML and the Legacy formats. Both are supported out-of-the-box.

If you want to stay on an older version of Office, and need to exchange documents with someone using the new OOXML formats, then you need Microsoft’s Compatibility Pack. As the above article points out, getting this to work in practice requires first ensuring that your patch level is current.

What about ODF? If you are on Microsoft Office, then there are two initiatives underway to bring ODF support to Office. One is the Microsoft-supported (and now Novell as well) odf-convertor project on SourceForge. Their initial deliverable will be the “ODF Add-in For Microsoft Word”. I didn’t have all that much luck with an earlier “alpha” version of the Add-in, but I’ve heard it is much improved. However, in the near term it only supports reading ODF text documents. No support for writing, and no support for presentations or spreadsheets. These other features are slated to be delivered in future phases of the project. The Open Document Foundation is also developing a convertor, which they call the “ODF Plugin”. Sam Hiser will be presenting on it at XML 2006 in Boston, so hopefully we’ll learn more about it then.

If you are running OpenOffice.org, then you already have excellent integrated conversion support between ODF and the Legacy Office formats. But if you need to exchange documents with someone using Office 2007 and its default OOXML formats then you are out of luck for now. However, please note that the recent Novell/Microsoft agreement included a statement (if I’m reading this correctly) that Novell would help add OOXML support to OpenOffice.org. So this support should eventually make it into OpenOffice.org.

So, based on what really works today, I’d offer this recommendation: If you must upgrade to Office 2007 , then turn the default file formats to be the Legacy binary formats. Until the OOXML convertors mature and all Office users have migrated off the beta and have compatible OOXML versions, you’ll only be causing chaos with those you exchange documents with if you save as OOXML.

Filed Under: ODF, Office, OOXML

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 17
  • Page 18
  • Page 19
  • Page 20
  • Page 21
  • Interim pages omitted …
  • Page 23
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies