• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for Rob

Rob

A Game of Zendo

2006/07/18 By Rob 11 Comments

It is the type of response that was crafted to end all debate and justify all sins: “Backward compatibility with billions of documents produced over decades”. Variations of this occur everywhere. Rather than cite them all, a simple Google query will bring up a representative sample.

Let’s take a deeper look at this argument.

There is a game called Zendo, where a player, called the “Master”, forms in his mind a secret rule which governs the selection and arrangement of objects (often small colored blocks). Arrangements which conform to the secret rule are said to have “Buddha nature”. The other players take turns selecting and arranging their own blocks to conform to what they think the secret rule is, to which the Master will acknowledge success or failure. The winner is the one who first guesses the secret rule, which might be something “an odd number of blocks, at least one of which must be red”.

Microsoft is playing Zendo with the OOXML specification. The Master has formed a secret rule. He calls it, “backwards compatibility with billions of office documents”. But since the file format documentation for the proprietary legacy binary formats has not been made public, the rule might as well just been called “Buddha nature”. It is just as opaque. We have no way of judging whether any specific requirement of OOXML is there to support backwards compatibility, or whether it is just there for the convenience of the Office development team. Or in fact whether it is there to raise barriers to non-Microsoft implementers. How could we know, since the solitary constraint on the creation of OOXML dependent on information that isn’t public? Does Ecma TC45 itself even have access to the binary format specifications? How are they able to properly judge what is done in the name of compatibility? Do we all just take Microsoft’s word for it?

The key point (in my opinion) is that legacy compatibility may be a constraining factor, but it need not be the sole determining factor.  There are many, perhaps an infinite number of possible markups which would be compatible with the legacy formats, meaning the legacy documents can be unambiguously transformed into the new XML format. The constraint should be that they are mappable, not that they must be identical. Among the set of such possible XML formats, some will be elegant, some sloppy, some bloated, some sparse, some which will be easy for others to implement, some designed to minimize conversion work for just one vendor, etc. In other words, this can be done well, or it can be done poorly. The constraint of compatibility does not justify everything.  Compatibility is one requirement, but it is not the only requirement.

An example may make things clear. Word has a feature called Art Page Borders. If you are like me, you’ve gone 15 years without seeing or using this feature. But it is there, under the Format/Borders and Shading menu, on the Page Border tab.

Art Borders dialog box in Microsoft Word

The markup needed to define these borders is covered in section 2.18.4 “ST_Border (Border Styles)” of the OOXML specification. Here we see descriptions and images of 200 hundred or so Art Page Borders. The images are heavily weighted to Western European, even Anglo-American celebratory icons, things like gingerbread men for Christmas, pumpkins for Halloween, or images of Cupid for St. Valentine’s day, or globes which are neatly centered on the United States. I think it is a legitimate concern that a document format with such obvious cultural biases is moving forward toward an international standard.

Further, I am concerned that the specification includes what can only be considered a clipart collection. What legal rights does the implementer have to reproduce this clipart? Keep in mind that Microsoft’s “Covenant Not to Sue” covers patents, not copyrights. I haven’t seen anything that would grant implementers of OOXML the rights to reproduce this clipart in their application. Is the specification hard-coded to use clipart which we cannot copy?

All of these problems (spec bloat, cultural bias, non-extensibility, copyright concerns) can be solved by one simple mechanism. Instead of having ST_Border be a fixed enumerated set of values, have it include only a small number of trivial values like the basic line styles, and have everything else (all of the Art Borders) be stored as a separate image file in the document archive.

So, if you load a Word XP document that uses the “candyCorn” Page Border, then when you write it out to OOXML, you would include a single frame of that art in the zip file and have the XML document reference that image for the border, tiling as necessary. This solution has several advantages:

  1. It removes some bloat from the spec. No need to document 100’s of page border clip art
  2. It lowers the barrier to implement. No one is required to implement 100’s of border styles. They are all generated on-the-fly based on images stored in the document.
  3. Copyright concerns are eliminated.
  4. Is an extensible approach. An implementation can include different or additional border styles according to their business and cultural requirements.
  5. It is compatible with legacy documents. Any existing Word binary or XML document can unambiguously be mapped into this scheme

Of course, this approach would require some minimal code changes in Microsoft Word to support this extensible mechanism. But remaining backwards compatible with the Microsoft Word product was never a stated constraint on OOXML. No one ever said that the goal of Ecma OOXML was to reduce the cost for Microsoft to implement it. It is all about the legacy documents, right?

So there it is, one example to illustrate a point that can be repeated over and over again. Among the potential universe of compatible XML formats for Office are those which are flexible, easy to use, easy to implement, as well as those which simply perpetuate the status quo and vendor lock in.

  • Tweet

Filed Under: OOXML Tagged With: Microsoft, OOXML, Zendo

Lost in Translation

2006/07/14 By Rob 3 Comments

In the last installment I looked at the way the ODF Add-in for Word 2007 integrates into the Word UI. Now let’s drill down into an actual conversion and see what fidelity we get.

I downloaded the code from SourceForce and installed on a machine running the Office 2007 beta 2. The Add-in pre-reqs the .NET 2.0 runtime, an additional 22MB download. The current version only supports reading ODF documents, not writing, and only handles the word processor ODF format.

Now for fidelity. Since you may not all have Office 2007 beta 2 installed, I’m going to show you the fidelity via PDF exports. In all cases I manually verified that the PDF output was identical to what I saw on the screen, every error is real, nothing introduced by the PDF export process.

First up is a document I call “the sampler”. It has a little bit of all the basic word processor formatting, fonts, alignment, nested tables, graphics, other character sets, headers/footers, images, captions, etc. It is not intended to be a particularly hard test of document conversion, but a basic test of core functionality.

So, here is the sampler, in the original ODF format, as well as the PDF rendering of it in OpenOffice 2.0.3, where it was originally created.

I then exported that file from OpenOffice to Word format. This demonstrates the quality of conversion users already get when running OpenOffice. Here is is in DOC and PDF exported after loaded the DOC file in Word 2007 beta 2.

Good, but not perfect. Some differences:

  • the bullet point size larger in Word than in OpenOffice
  • the nested table collapsed into main table in Word
  • the above table problem causes the table to take up more vertical space, pushing the graphic onto a second page

Again, that is the OpenOffice –> Word conversion we all have available for free today in open source code. Since DOC is a proprietary binary format with inadequate publicly-available documentation, this level of fidelity is impressive. So moving from ISO ODF to Draft Office Open XML should be that much easier, especially since the target format is voluminously documented (4,000 pages and growing), and the writers of the translator are receiving technical assistance from Microsoft.

Let’s take a look. From within Word 2007 (beta 2) I use the ODF Add-in to load the sampler ODF file, and get something that looks like this PDF.

I won’t characterize it but to say it fared less well than I expected. Problems include:

  • headers/footers dropped (data loss)
  • bullet list indentation ignored
  • number list indentation ignored
  • table dimensions messed up
  • caption for the graphics sized and positioned incorrectly

Whether these are all bugs or merely functional limitations is an interesting question. There is a Functional Specification document available on SourceForge for the Add-in which lists these requirement:

2.1.1.1. Basic Formatting

Here is the list of formatting items that the Add-in and command line translator would keep intact. The first 10 in the list are must haves and the last 4 (number 11 to 14) are good to have items of formatting.

  1. Bold
  2. Italics
  3. Underline
  4. Bulleting
  5. Numbering
  6. Indentation
  7. Alignment (Left, Center, Right)
  8. Font size
  9. Font face
  10. Tabs
  11. Tables
  12. Font color
  13. Highlights
  14. Background colors

Tables are “nice to have”? I’d hope so! This does not give me the impression that full fidelity is in their plans. Forget about scripts and macros. They are not even planning on tables or font colors. I hope I am wrong or misinterpreting their plans here, but that is the requirements document they have posted.

  • Tweet

Filed Under: Microsoft, ODF Tagged With: Add new tag, Word 2007

Traduttore, Traditore

2006/07/13 By Rob 7 Comments

Brian Jones in his blog entry of 11 July 2006, comments on their recently announced ODF Translator:

It’s directly exposed in the UI. We’re even going to make it really easy to initially discover the download. We already need to do this for XPS and PDF, so we’ll also do it for ODF. There will be a menu item directly on the file menu that takes to you a site where you can download different interoperability formats (like PDF, XPS, and now ODF).

Heck, if you wanted to be even more hardcore, the Office object model allows you to capture the save event. So if you wanted to you could make it so that anytime you hit save you always used the ODF format, just by capturing the save event and overriding it. I’m not expecting folks to do that, but it does show just how extensible Office really is.

One might ask, is it a “hardcore” view to want ODF to be the default format for documents saved in Office? Isn’t this exactly what Massachusetts ITD requested in their RFI?

What Jones does not say is that Word 2007 puts the ODF format at a disadvantage, making it harder than necessary to work with. Although end users are given a simple and direct UI for changing the default file format in Word 2007 to other file formats such as RTF, DOC or even ASCII text, ODF is not allowed as a default. Why should ODF users be forced to use “hardcore” programming to capture the “save event” to accomplish this same task?

Let’s take a look at the UI we’re given. Screen shots are based on Word 2007 Beta 2, and the ODF Add-In for Word 2007.

Launch Word, create a document and try to save it, using the File Save menu, or the age-old familiar short cut, Control-S. What do you get? See the following screen shot for the familiar File Save dialog. Although Microsoft formats like DOCX, DOC and XPS are available, as well as export formats like PDF, HTML and Plain Text, you will not find ODF listed.

One new twist is the “Tools” button added to the Save As dialog. Pressing that reveals new options including something called “Save Options” which looks like this:

Here we see how Microsoft treats the file formats it favors with first-class support. Word 2007 allows you to choose which file format will be the default format when you save a document. You can keep the default format (Draft Office Open XML) or choose the legacy binary DOC format, HTML, or older formats like RTF or even Plain Text. But you will not find the ISO OpenDocument Format on this list.

So the question to ask is why Microsoft integrates ODF in a way which treats it as a 2nd class citizen, treated less favorably than even Plain Text?

  • ODF cannot be made the default format
  • ODF documents can not be round-tripped
  • ODF documents are not accessible via the familiar keyboard shortcuts for opening and saving files (Control-O and Control-S)
  • ODF documents pay a performance penalty for having to be indirectly converted via Draft Office Open XML rather than via native support

[ 7/2/6/2006 The integration discussion continues here]

  • Tweet

Filed Under: Office Tagged With: ODF, ODF Add-in

50 years ago

2006/02/15 By Rob 1 Comment

I find it interesting to take a look back at the commemorative stamps issued in 1956. What do we choose to remember, and how do we remember it?

For example, 1956 was the 100th anniversary of the the birth of Booker T. Washington, a great leader in education and civil rights, founder of the Tuskegee Institute in Alabama. He was first honored on a U.S. postage stamp in the Famous Americans series of 1940, with a head and shoulders portrait.

But in 1956, the centennial celebration, how was Booker T. Washington honored?

This stamp, issued April 5th, 1956 was designed by Charles R. Chickering, artist at the Bureau of Engraving and Printing. When I first saw this stamp, my reaction was immediate. Where is the portrait? 1956 was the centennial of the birth of the man, not the anniversary of a log cabin. I found this odd.

The other person honored in 1956 was Benjamin Franklin, on the 250th anniversary of his birth. This design in bright carmine (also by Chickering) was based on the painting “Franklin Taking Electricity from the Sky” by Benjamin West and was issued on January 17th, 1956.

2006 is the 300th anniversary of Franklin’s birth, and a set of 4 stamps is due to be issued April 4th. You can see the planned designs here.

  • Tweet

Filed Under: Philately

Epithets

2006/01/16 By Rob 1 Comment

A few thoughts on the Epitheton Ornans, or ornamental epithet. This is more than a nickname, but a formalized word or phrase associated with a person. Classical epic poetry makes heavy use of this rhetorical device. For example, in Homer Achilles is often referred to as “podas okus” or “swift-footed”, whereas Agamemnon is often “anax andron” or “ruler of men”. There is internal evidence that these poems used a stock list of epithets of different lengths and stress paterns to fit into whatever metrical context was needed. In this way, the epithets could aid improvized oral performance, much as a jazz musician has a repetoire of riffs and chord progressions at his command which can be inserted to fill out a phrase.

The Romans allowed the honor of an “agnomen” for significant military victories. So Publius Cornelius Scipio, after defeating the Carthaginian Hannibal, became Scipio Africanus. Over the centuries, this trend escalated. So, by the 4th Century A.D., we have awe-inspiring names such as “Imperator Constantinus Maximus Augustus Persicus maximus, Germanicus maximus, Sarmaticus maximus, Britannicus maximus, Adiabenicus maximus, Medicus maximus, Gothicus maximus, Cappadocicus maximus, Arabicus maximus, Armenicus maximus, Dacicus maximus”. (Today We just call him “Constantine the Great” which is a great time-saver)

The trend continued. If you’ve seen an old British penny, from 100 years ago, you would read the legend “VICTORIA D G BRITT REG F D”, short for “Victoria, by the Grace of God, Queen of England, Defender of the Faith”.

But the use of epithets has been on the wane for many years now, at least in the optimistic parts of the world. North Korea may have its “Dear Leader” and the late “Great Leader”, but we never even considered formally naming Eisenhower “The German Slayer”. We ended up with “Ike”. I guess we like our leaders to be mere men, and not gods. The Cult of Personality is difficult to maintain in a democracy with a free press. “No man is a hero to his butler”.
Sure, we have our little nicknames, “The Artist formally known as Prince”, “Iron” Mike Tyson or the “Scud Stud”, but that is done in jest, or in the entertainment world (which amounts to the same thing). We will never see “Scud Stud” carved in marble or engraved in brass.

But once a year, on this date (or the nearest Monday) I am reminded of the most prominent example of epitheton ornans in common use today. I refer to the ubiquitous use of the phrase “Slain Civil Rights Leader”. The fact that I do not need to name the owner of this epithet demonstrates its currency. A search of Google News shows almost 1,500 uses of this phrase in recent press clips. This epithet is so tightly associated with him that can be used as a substitue for his name, much as a medieval scholar could speak of “the Philosopher” to refer to Aristotle without ambiguity.

I’m trying to think of any other prominent examples of such epithets in common use today. I can’t think of any. Can you?

One wonders how long this epithet will remain? Will it outlast the generation that heard his message and headed his Dream? We can hope so. But I do note that in the generation after the assasinations of Lincoln, Garfield and McKinnley, all three were popularly acclaimed with the epithet “our martyred president”. But a search of Google News shows zero hits for “martyred president”, though there are 271 hits for “President Lincoln”.

  • Tweet

Filed Under: Uncategorized

  • « Go to Previous Page
  • Go to page 1
  • Interim pages omitted …
  • Go to page 66
  • Go to page 67
  • Go to page 68
  • Go to page 69
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2023 Rob Weir · Site Policies