• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for 2007

Archives for 2007

Guillaume Portes Redux

2007/01/14 By Rob 12 Comments

My post of 10 days ago, How to hire Guillaume Portes, received quite a bit of attention, with over 50,000 page views, links from 25 blogs, and around 300 comments left by visitors to this blog, Slashdot and the Joel on Software discussion group. I’d like to thank all that took the time to read and to comment.

It is good to continually tell the story and make the case. Having two standard file formats for office documents would be a bad thing for commerce, for end users and for the industry. With two formats, end users will be confused and costs will be higher for those who sell and buy software that works with documents. This will essentially cause a frictional drag on the document processing market. Sure, there will be those who will benefit from the chaos, just as there are those who benefit from the friction of currency exchanges. But over the years we’ve learned the value of things like uniform commercial codes, currency unions and uniform trade regulations.

I’ve heard no one complain about having lost their freedom of choosing the Mark over the Lira over the Franc. We simply use the Euro and then concentrate on what we are buying or selling, not on the currency. In a similar way we should agree on a single document format and then concentrate on application features and user needs and what we are trying to communicate, and stop worrying about file formats. When done well, file formats are invisible. They are not seen by end users, are not discussed by the press, and not thought about by (most) engineers. The fact that I’m writing about OOXML at all and not about my wine making exploits is an aberration caused by the failure of the dominate market player to provide an open document standard that allows users to own their documents.

But I digress…

Now that I’ve finished reading all of the comments, I’d like to review with you some of the better ones, pro and con, along with my commentary.

Let’s start.

I don’t know how many of you noticed: The fictional name “Guillaume Portes” is actually a literal translation of “Bill Gates” in French.

If you noticed this as well, give yourself 3 extra points. Many of my posts have a secret joke, and I hope these will bring a smile to those who find them.

Here’s comment questioning whether there is a problem with OOXML:

I haven’t looked at the spec, so I don’t know how good or bad it is. But the examples he cites don’t strike me as such a problem. They’re all just to maintain backward compatibility with documents from old versions of Word and other apps. You would be free to ignore them if you don’t need that compatibility. I’m not sure how else they could have done it.

A similar view was expressed by another reader:

I don’t know if it has been stated here, but you do know that supporting these the compatibility options is not required for OpenXML compliance? Developers are free to leave these out of they want.

By that same argument developers can also leave out text alignment, images and tables, since these features are not required for compliance either. In fact, everything in OOXML is optional. If you read the compliance definition in the OOXML specification, it comes down to this statement in Section 2.5:

Application conformance is purely syntactic…A conforming consumer shall not reject any conforming documents of the document type expected by that application. A conforming producer shall be able to produce conforming documents.

Given this definition of conformance, a fully conformant OOXML application can be as simple as:

cp foo.docx bar.docx (Linux) or copy foo.docx bar.docx (Windows)

In the end, regardless of whether a feature is optional, or even deprecated, if that feature occurs in real OOXML documents, then an OOXML application that aspires to be used and have viability, either commercially or as open source, will need to support it. It is that simple.

There is only one OOXML specification and to an end user all OOXML files are equivalent and interchangeable. The user who receives a document via email, from a government web site, from a colleague, friend, teacher, etc., doesn’t know whether the document was created in Word, created in OpenOffice, created from scratch in Office 2007, saved from Office 2000 with the Compatibility Pack or whether the document was originally authored in WordPerfect and made it into OOXML format only after migration over years via various Office upgrades. It is a DOCX file and users will expect that applications that claim OOXML support will work with their DOCX file, period. Anything else is a support nightmare. That is the entire point of a standard — interoperability — so we must judge OOXML by how well it facilitates that function.

Here is another comment with a view expressed by others as well:

Someone needs to tell every developer of word processing and page layout software on the planet to abandon the ‘must look the same’ obsession described by the above. Why worry about making content in application B look like content in Application A? I create books out of Word files submitted by several people. The last thing I want is all the inconsistent formatting from each of them to control a book’s look.

Named styles is the answer. If a paragraph is body text, call it that. If it’s an inset quote, call it a quote. If a term is in italics, label it as italicized style not Times Italic 12 point. But don’t get all hung up in the distinctions between Times Roman and Times New Roman. The purpose of XML is to define what something is. Not what someone thought it ought to look like on Tuesday three weeks ago.

My personal views are very much in alignment with these sentiments. I think WYSIWYG has done more bad than good over the years, and that strict separation of content, layout and styles should be maintained. However, I also know that my personal views are not universally held, and that the word processor has evolved over the years to be a flexible, multi-paradigm tool that can support both structured document editing as well as looser, ad-hoc editing by users who just need to grind out a memo. A document format for a modern word processor must support both uses.

I’m glad someone brought up the core question:

Considering the requirement that the standard allow for compatibility with existing documents, what would you suggest?

Silently altering documents that are converted into OpenXML?

Disallow automatic conversions whenever a compatibility flag would have otherwise been needed?

One solution approach was mentioned by several users, for which I give two examples:

There is no need to include features from 16 year old (or any age) applications in a new standard. If you want to convert, you convert. If WP6 linespacing is 0.8 of Word2007 linespacing, you write linespacing =”0.8″ in your converted document. You DON’T write useWP6linespacing linespacing =”1″

That is just plain silly. That is making a specification unnecessary large for instances that are rarely used by the general public.

As said: if you want to convert, than use a conversion tool. Do not use a modern specification to hold all legacy features.

And:

Let the plugins do the dirty work of native in-memory-binary representations to XML and back conversions.

Keep the XML file format clean, open, unencumbered, application independent, cross platform, universally transformative and exchangeable, portable and timeless.

I think this is the key point, and I’m gratified that so many readers picked up on it. There is no good reason to have these compatibility flags at all. Instead of having several undefined compatibility flags for legacy line spacing options, we should have a flexible line-spacing model in OOXML and when loading legacy binary documents, convert them as necessary into the line spacing model of OOXML. If the text model in OOXML is sufficiently expressive, this can be done with no loss of fidelity. (And no, a flag that says merely “do it like Word 95” is not an example of expressiveness).

This is what I mean by “generalize and simplify”. A simplified specification is not necessarily less expressive or less capable. A specification is simplified when it supports internal reuse and accomplishes its task with minimal means.

However, if the text model of OOXML is not flexible enough to support even legacy versions of Word, then what hope will the rest of the industry have in adopting it as a format? How will Novell manage with getting OpenOffice to use it, or Corel to get WordPerfect to use it? What about Lotus WordPro? Will Ecma add special compatibility flags to the OOXML specification to account for the quirks of every word processor with legacy documents? What would OOXML look like if we all loaded it up with such legacy flags? Is this the precedent we want to set?

Why should OOXML have special flags for WordPerfect 6.0 (1996) but not have special flags for WordPerfect 12.0 (2004) or the new X3 version (2006)? Is this purely because Microsoft considered WordPerfect to be competitor back in 1996 but now no longer cares? Is this the way to go about designing an ISO standard?

I believe that having compatibility flags in the specification for all word processors in use today is not a practical solution, and that having such flags only for Word and ancient versions of competing products is an approach that benefits only Microsoft.

One last point, since this post is already too long. Microsoft’s Brian Jones is claiming that ODF has a similar issue, in that OpenOffice writes out a number of application-specific settings when it saves a document. This is a good illustration of an important distinction. The items that OpenOffice writes out (you can see an example in Brian’s post) are vendor-defined, document-level application settings. There are now and will continue to be multiple implementations of ODF and it is legitimate that they have application-defined features. These are stored as name/value pairs in a separate XML file in the ODF archive.

I can think of no argument against that. Obviously no interoperability is expected for these vendor specific features, which are for things like application settings like window sizes, zoom factors, print settings, etc. In any case, ODF merely provides a place for applications to store these settings. To blame ODF for any vendor misuse of this feature is like blaming the W3C and HTML for non-standard extensions in Internet Explorer.

OOXML, on the other hand, does not seem to have given much thought to what would be needed in a format that has multiple supporting applications. Only a single application (MS Office) has been explicitly considered, and support for that one application, and its predecessor versions, have been hard-coded into the OOXML schema.

  • Tweet

Filed Under: OOXML

Surviving the Slashdot Effect

2007/01/08 By Rob Leave a Comment

You wake up one morning, check your email and what do you see? Fifty comments in your moderation queue awaiting approval. Hmmm… You then check your server stats and see that 500 people have hit your blog in the last 5 minutes. Hmmm… You check the referral’s and see that they are all coming from a familiar site.

Congratulations, you’re on Slashdot. But will your site survive the Slashdot Effect, or will it be a casualty of the day’s load? What do you do now? What can you do?

Here’s some things to consider, based on my experience last Friday when I had a 50,000-page day.

  1. If you have it, turn off comment moderation for the day. Comments will come in faster than you can approve them and you likely already have a backlog.
  2. Download the server logs and see what your bandwidth use is for the day so far. While that is processing, check with your host to see what your bandwidth quota is. Have a credit card handy in case you need to quickly upgrade.
  3. A look at the comments on Slashdot indicated that some were experiencing slow response time. So I simplified the page and took out some extraneous images from the side bar. From the logs I had downloaded earlier it was clear that a good percentage of users were going to the blog’s home page after reading the post. So I simplified that page as well, having it show only the last 5 post rather than the last 20. This improved response time as well as reduced the bandwidth requirements.
  4. I’m sure you’re familiar with Linus’s Law, as stated by Eric Raymond: “Given enough eyeballs all bugs are shallow”. This applies to prose as well as to code. So it is a good time to proofread your post and make sure spelling and usage are correct, that all the links work, that quotes and ideas are correct attributed, etc. With 50,000 readers today, even the smallest error will be noticed by 100’s of them.
  5. Since your post will now be in front of many more people, including those who have not been following the topic closely, you might want to add some links to background information.
  6. Do a quick security audit. Do you have reasonably complex passwords for your server accounts? Have you changed them recently?

I’d be interested in hearing what advice others have for how to quickly up-armor your site when sudden load surges occur.

  • Tweet

Filed Under: Blogging/Social Tagged With: Slashdot Effect

The Formats of Excel 2007

2007/01/08 By Rob 29 Comments

I’ve installed the new Office 2007. This isn’t my preferred platform. In fact I find I’m not using heavy-weight editors of any variety much. For every page I compose in a dedicated word processor I author perhaps 50 pages in emails, blogs or wiki’s. However, since I do have a license for Office 2007, and I am curious, I decided to take it for a spin. If you want to be a film critic, you’ve got to see the movies…

Here is a quick survey of what I saw in Excel 2007, concentrating on the file format support, my particular area of interest.

First, let’s look at the “Save As” dialog. As you can see from this screen capture, we have some new options:

The Default

The first choice saves in the default format. This is configurable under “Excel Options”, but by default this saves in the new Office Open XML (OOXML) format, with an “xlsx” file extension.

With Macros

The “Excel Macro-Enabled Workbook” option saves as an “xlsxm” extension. It is OOXML plus proprietary Microsoft extensions. These extensions, in the form of binary blob called vbaProject.bin, represent the source code of the macros. This part of the format is not described in the OOXML specification. It does not appear to be a compiled version of the macro. I could reload the document in Excel and restore the original text of my macro, including whitespace and comments. So source code appears to be stored, but in an opaque format that defied my attempts at deciphering it.

(What’s so hard about storing a macro, guys? It’s frickin’ text. How could you you screw it up? )

This has some interesting consequences. It is effectively a container for source code that not only requires Office to run it, but requires Office to even read it. So you could have your intellectual property in the form of extensive macros that you have written, and if Microsoft one day decides that your copy of Office is not “genuine” you could effectively be locked out of your own source code.

New Style Binary

The “Excel Binary Workbook” option caught me by surprise. This is not the legacy binary formats. This is not the new OOXML. This is a new binary format, with an “xlsb” extension. Similar to OOXML it has a Zip container file (the so-called Open Packaging Conventions container file format), but the payload consists (aside from a manifest) entirely binary files.

I can’t tell if they are some proprietary binary mapping of the OOXML XML, or whether this is an entirely new binary format unrelated to the XML format. In any case this format is entirely undocumented and is unreadable to anyone by Microsoft.

It is also interesting that Microsoft is positioning this format as the preferred one for performance and interoperability. The online help for Excel 2007 says:

In addition to the new XML-based file formats, Office Excel 2007 also introduces a binary version of the segmented compressed file format for large or complex workbooks. This file format, the Office Excel 2007 Binary (or BIFF12) file format (.xls), can be used for optimal performance and backward compatibility.

Old Style Binary

The Excel 97-2003 option provides the legacy binary “xls” formats, the familiar BIFF format from earlier versions of Office.

Find add-ins

This takes you to a page where you can download the “Microsoft Save as PDF or XPS” Add-in. Note that you are prompted to download an Add-in that provides support for both PDF and XPS. But if you hunt around a bit you can find another page where you can download just one format or the other, which is what I did, installing just the PDF support. This added a new option, “PDF” to the Save As dialog.

Other Formats

This brings up a dialog where you can choose from the previously mentioned formats as well as the several legacy export formats, including:

  • XML Data
  • Web Page
  • Text
  • Unicode Text
  • XML Spreadsheet 2003
  • Excel 5.0/95 Workbook
  • CSV
  • Formatted Text
  • DIF
  • SYLK

Summary

My overall impression was soured a bit by the large number of crashes I experienced. Indeed Excel crashed on exit on almost every session. This was dozens of crashes over the course of an afternoon. This will need to be fixed before I would trust it with my data.

Another curiosity was a legacy binary document that gave the following error message whenever I tried to save it to the new OOXML format:

It did not get this message when I saved it back to the binary format. So evidently I’m losing something when moving to OOXML, whatever “Line Print settings” are. So much for the claims of 100% backwards compatibility…

My examination also put to rest any lingering hope I had that Microsoft had fundamentally changed their position on proprietary file formats and has decided to follow in the paths of openness. The new proprietary binary format and the undocumented ways that macros are encoded put any hope of that to rest.


1/22/07, A quick update: Microsoft’s Doug Mahugh helped track down and fix the crash problem I had earlier reported when exiting Excel. This is a bug in the”Send to Bluetooth” COM Add-in that Excel was loading at startup. After disabling that Add-in, I’m no longer crashing.

  • Tweet

Filed Under: Microsoft, OOXML

Broken Windows and the Ghost of Keynes

2007/01/03 By Rob 11 Comments

Bad ideas never die.

The latest relapse is:

For every dollar of Microsoft revenue from Windows Vista in 2007 in the U.S., the ecosystem beyond Microsoft will reap $18 in revenues. In 2007 this ecosystem should sell about $70 billion in products and services revolving around Windows Vista.

The source for this rosy forecast is the recent IDC whitepaper, The Economic Impact of Microsoft Windows in the United States.

They summarize this boon as:

The IDC research shows that the launch of Windows Vista will precipitate cascading economic benefits, from increased employment in the region to a stronger economic base for those 200,000 or so local firms that will be selling and servicing products that run on Windows Vista. Nearly two million IT professionals and industry employees will be working with Windows Vista in 2007.

These direct benefits —157,000 new jobs and $70 billion in revenues to companies in the US IT Industry — will help local economies grow, improve the labor force, and support the formation of new companies. The indirect benefits of using newer software will help boost productivity, increase competitiveness, and support local innovation.

In the history of economic thought, this is Multiplier Effect, the belief that an increase in spending leads itself to more spending and even more spending, in a feedback loop that in the end amounts grows the entire economy. This bootstrap theory was popularized by John Maynard Keynes and became influential in some circles as a way to reduce underutilization in the economy. In other words, if unemployment is high and industrial capacity is underused, then it is worth while to have the government make work for people or spend money. Work, any work, will get the money flowing again. This lead to the various “alphabet agencies” of F.D.R.’s New Deal program.

Keynes, at his boldest, illustrated the magical properties of his multiplier effect like this:

If the Treasury were to fill old bottles with banknotes, bury them at suitable depths in disused coal mines which are then filled up to the surface with town rubbish, and leave it to private enterprise on well-tried principles of laissez-faire to dig the notes up again (the right to do so being obtained, of course by tendering for leases of the note-bearing territory), there need be no more unemployment and with the help of the repercussions, the real income of the community, and its capital wealth also, would probably become a good deal greater than it actually is.

— from The General Theory of Employment, Interest and Money

The most cogent criticism of the Magic Multiplier goes back 50-years to Henry Hazlitt’s, Economics in One Easy Lesson, where he tells the tale of “The fallacy of the broken window”. It goes something like this:

Imagine the town baker’s shop window is broken by an errant baseball throw. A unfortunate expense to the baker, one might say. But that is a narrow parochial view. Look instead at the benefit to the whole community. The window will cost $300 to replace. That money will go to the glazier who will then use his profits to buy a new sofa from the furniture store, who will then use his profits to buy a new bicycle for his child from the toy store, and so on. The money will continue to circulate in over-widening circles, bringing joy to all. The original loss of $300 by the baker will more than be made up for by the aggregate increase in the amount of goods and services exchanged in the town. Instead of punishing the little boy who broke the window, he should be raised up and praised as a Universal Benefactor and Economic Sage of the First Order.

The problem with that argument is it fails to look at the poor baker and what he might have done with the $300 if his window had not broken. Maybe he would bought a new suit with that money. The tailor then might have bought a new sofa with his profits, and so on. The interconnectedness of the economy was not precipitated by the broken window. It was always there. The only thing that changed by the broken window is that the baker has no new suit, and the glazier has his money. Since you can never see the suit that was never made, it is easy to forget that the benefits to the glazier did not come from nothing.

So back to the IDC report, and this forecast of $70 billion dollars in Vista-related spending. The question to ask is, where is all this money coming from? And what might it have been used for if not spent on Vista-related purchases? Obviously this money was not created out of a vacuum. Is it coming from profits? From shareholders? From deferring other investments? Cutting back on training? Moving more jobs off-shore? Reducing quality? What companies and sectors of the economy are going to suffer for this shift in investment? What innovations will not occur because people are allocating resources to this upgrade?

In the end is $70 billion of new value really being produced? Or are we merely fixing broken Windows?

  • Tweet

Filed Under: Economics

How to hire Guillaume Portes

2007/01/03 By Rob 65 Comments

You want to hire a new programmer and you have the perfect candidate in mind, your old college roommate, Guillaume Portes. Unfortunately you can’t just go out and offer him the job. That would get you in trouble with your corporate HR policies which require that you first create a job description, advertise the position, interview and rate candidates and choose the most qualified person. So much paperwork! But you really want Guillaume and only Guillaume.

So what can you do?

The solution is simple. Create a job description that is written specifically to your friend’s background and skills. The more specific and longer you make the job description, the fewer candidates will be eligible. Ideally you would write a job description that no one else in the world could possibly match. Don’t describe the job requirements. Describe the person you want. That’s the trick.

So you end up with something like this:

  • 5 years experience with Java, J2EE and web development, PHP, XSLT
  • Fluency in French and Corsican
  • Experience with the Llama farming industry
  • Mole on left shoulder
  • Sister named Bridgette

Although this technique may be familiar, in practice it is usually not taken to this extreme. Corporate policies, employment law and common sense usually prevent one from making entirely irrational hiring decisions or discriminating against other applicants for things unrelated to the legitimate requirements of the job.

But evidently in the realm of standards there are no practical limits to the application of this technique. It is quite possible to write a standard that allows only a single implementation. By focusing entirely on the capabilities of a single application and documenting it in infuriatingly useless detail, you can easily create a “Standard of One”.

Of course, this begs the question of what is essential and what is not. This really needs to be determined by domain analysis, requirements gathering and consensus building. Let’s just say that anyone who says that a single existing implementation is all one needs to look at is missing the point. The art of specification is to generalize and simplify. Generalizing allows you to do more with less, meeting more needs with fewer constraints.

Let’s take a simplified example. You are writing a specification for a file format for a very simple drawing program, ShapeMaster 2007. It can draw circles and squares, and they can have solid or dashed lines. That’s all it does. Let’s consider two different ways of specifying a file format.

In the first case, we’ll simply dump out what ShapeMaster does in the most literal way possible. Since it allows only two possible shapes and only two possible line styles, and we’re not considering any other use, the file format will look like this:

<document>
<shape iscircle="true" isdotted="false"/>
<shape iscircle="false" isdotted="true"/>
</document>

Although this format is very specific and very accurate, it lacks generality, extensibility and flexibility. Although it may be useful for ShapeMaster 2007, it will hardly be useful for anyone else, unless they merely want to create data for ShapeMaster 2007. It is not a portable, cross-application, open format. It is a narrowly-defined, single application format. It may be in XML. It may even be reviewed by a standards committee. But it is by its nature, closed and inflexible.

How could this have been done in a way which works for ShapeMaster 2007 but also is more flexible, extensible and considerate of the needs of different applications? One possibility is to generalize and simplify:

<document>
<shape type="circle" lineStyle="solid"/>
<shape type="square" lineStyle="dotted"/>
</document>

Rather than hard-code the specific behavior of ShapeMaster, generalize it. Make the required specific behavior be a special case of something more general. In this way we solve the requirements of ShapeMaster 2007, but also accommodate the needs of other applications, such as OpenShape, ShapePerfect and others. For example, it can easily accommodate additional shapes and line styles:

<document>
<shape type="circle" lineStyle="solid"/>
<shape type="square" lineStyle="dotted"/>
<shape type="triangle" lineStyle="dashed"/>
</document>

This is a running criticism I have of Microsoft’s Office Open XML (OOXML). It has been narrowly crafted to accommodate a single vendor’s applications. Its extreme length (over 6,000 pages) stems from it having detailed every wart of MS Office in an inextensible, inflexible manner. This is not a specification; this is a DNA sequence.

The ShapeMaster example given above is very similar to how OOXML handles “Art Page Borders” in a tedious, inflexible way, where a more general solution would have been both more flexible, but also far easier to specify and implement. I’ve written on this in more detail elsewhere.

Here are some other examples of where the OOXML “Standard” has bloated its specification with features that no one but Microsoft will be able to interpret:

2.15.3.6 autoSpaceLikeWord95 (Emulate Word 95 Full-Width Character Spacing)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 95) when determining the spacing between full-width East Asian characters in a document’s content.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

(This example and the following examples brought to my attention by this post from Ben at Genii.)

What should we make of that? Not only must an interoperable OOXML application support Word 12’s style of spacing, but it must also support a different way of doing it in Word 95. And by the way, Microsoft is not going to tell you how it was done in Word 95, even though they are the only ones in a position to do so.

Similarly, we have:

2.15.3.26 footnoteLayoutLikeWW8 (Emulate Word 6.x/95/97 Footnote Placement)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 6.x/95/97) when determining the placement of the contents of footnotes relative to the page on which the footnote reference occurs. This emulation typically involves some and/or all of the footnote being inappropriately placed on the page following the footnote reference.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

Again, in order to support OOXML fully, and provide support for all those legacy documents, we need to divine the behavior of exactly how Word 6.x “inappropriately” placed footnotes. The “Standard” is no help in telling us how to do this. In fact it recommends that we don’t even try. However, Microsoft continues to claim that the benefit of OOXML and the reason why it deserves ISO approval is that it is the only format that is 100% backwards compatible with the billions of legacy documents. But how can this be true if the specification merely enumerates compatibility attributes like this without defining them ? Does the specification really specify what it claims to specify?

The fact that this and other legacy features are dismissed in the specification as “deprecated” is no defense. If a document contains this element, what is a consuming application to do? If you ignore it, the document will not be formatted correctly. It is that simple. Deprecated doesn’t mean “not important” or “ignorable”. It just means that new documents authored in Office 2007 will not have it. But billions of legacy documents, when converted to OOXML format, may very well have them. How well will a competing word processor do in the market if it cannot handle these legacy tags?

So I’d argue that these legacy tags are some of the most important ones in the specification. But they remain undefined, and by this ruse Microsoft has arranged things so that their lock on legacy documents extends to even when those legacy documents are converted to OOXML. We are ruled by the dead hand of the past.

Let’s go back even further in time to Word 5.0:

2.15.3.32 mwSmallCaps (Emulate Word 5.x for the Macintosh Small Caps Formatting)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 5.x for the Macintosh) when determining the resulting formatting when the smallCaps element (§2.3.2.31) is applied to runs of text within this WordprocessingML document. This emulation typically results in small caps which are smaller than typical small caps at most font sizes.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

You’ll need to take my word for it that “This emulation typically results in small caps which are smaller than typical small caps at most font sizes” falls well short of the level of specificity and determinism that is typical of ISO specifications.

Further:

2.15.3.51 suppressTopSpacingWP (Emulate WordPerfect 5.x Line Spacing)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (WordPerfect 5.x) when determining the resulting spacing between lines in a paragraph using the spacing element (§2.3.1.33). This emulation typically results in line spacing which is reduced from its normal size.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

So not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect. Good luck.

My tolerance for cutting and pasting examples goes only so far, so suffice it for me to merely list some other examples of this pattern:

  • lineWrapLikeWord6 (Emulate Word 6.0 Line Wrapping for East Asian Text)
  • mwSmallCaps (Emulate Word 5.x for Macintosh Small Caps Formatting)
  • shapeLayoutLikeWW8 (Emulate Word 97 Text Wrapping Around Floating Objects)
  • truncateFontHeightsLikeWP6 (Emulate WordPerfect 6.x Font Height Calculation)
  • useWord2002TableStyleRules (Emulate Word 2002 Table Style Rules)
  • useWord97LineBreakRules (Emulate Word 97 East Asian Line Breaking)
  • wpJustification (Emulate WordPerfect 6.x Paragraph Justification)
  • shapeLayoutLikeWW8 (Emulate Word 97 Text Wrapping Around Floating Objects)

This is the way to craft a job description so you hire only the person you earmarked in advance. With requirements like the above, no others need apply.

As I’ve stated before, if this were just a Microsoft specification that they put up on MSDN for their customers to use, this would be par for the course, and not worth my attention. But this is different. Microsoft has started calling this a Standard, and has submitted this format to ISO for approval as an International Standard. It must be judged by those greater expectations.


Update:

1/14/2007 — This post was featured on Slashdot on 1/4/07 where you can go for additional comments and debate. I’ve summarized the comments and provided some additional analysis here.

2/16/2007 — fixed some typo’s, tightened up some of the phrases.

  • Tweet

Filed Under: OOXML, Popular Posts

  • « Go to Previous Page
  • Go to page 1
  • Interim pages omitted …
  • Go to page 15
  • Go to page 16
  • Go to page 17

Primary Sidebar

Copyright © 2006-2023 Rob Weir · Site Policies