Maybe I just have an ear for this, but whenever I hear a number of people saying the same odd thing, using the same strained phrase, it catches my attention and makes me take a closer look. Individuals naturally have a great diversity of expression and phrasing, so where this is lacking, and the Borg starts speaking as one, it is good to pay it some heed.
The word for today is “represents”. A few exemplary quotations to demonstrate a particular pattern of use that attracted my attention:
From Microsoft’s Open XML Community:
Open XML was designed to provide users the benefits of: faithfully representing in an open format existing office documents, interoperability, support across platforms and applications, integration with business data, internationalization, support for accessibility and assistive technologies, and long-term document preservation.
Microsoft’s Jean Paoli as quoted by Tim Anderson:
As a design goal, we said that those formats have to represent all the information that enables high-fidelity migration from the binary formats.
And Paoli again in a Microsoft press release:
So the Office Open XML file formats represent all the characteristics of the Office binary file formats, while making it easier for people to connect to the different islands of data in the enterprise.
Microsoft’s Brian Jones in a comment response on his blog:
We had to leave some legacy behaviors in place because the goal of our work was to create an XML format that could represent our existing base of Office documents.
From the OOXML Overview whitepaper [pdf] presented to JTC1:
OpenXML was designed from the start to be capable of faithfully representing the pre-existing corpus of word-processing documents, presentations, and spreadsheets that are encoded in binary formats defined by Microsoft Corporation.
From Ecma’s response to the JTC1 NB contradiction objections:
OpenXML has been designed to be capable of faithfully representing the majority of existing office documents in form and functionality.
Microsoft’s Stephen McGibbon:
I represent Microsoft at all kinds of meetings and my firm understanding is that one of the things that differentiates OpenXML and ODF is OpenXML’s ability to faithfully represent all of the previously created Microsoft Office binary format documents.
So what are we to make of this? They are being very specific about their choice of words, aren’t they? I wonder why…
A file format represents data. It stores data. It encodes data. These are all synonymous. But the ability to represent data is a trivial thing to do. For example, here an example of a markup language that can also represent all legacy Microsoft Office documents:
<office-document>
<one/>
<one/>
<zero/>
<one/>
<zero/>
<zero/>
</office-document>
Since the above markup directly maps to binary, it can faithfully represent 100% of existing Office documents with 100% backwards compatibility. It can also represent perfectly the documents of every other vendor, past, present and future.
But before Ecma gets all excited that they may soon have another standard to Fast Track, I must admit the obvious. This markup is not all that useful as an interoperable document format. Why? Because although it can represent 100% of legacy documents, it does not specify how to do anything with them. Except at the level of a bit, the format does not express any structure or semantics. Although you can express anything you want with 1,’s and 0’s, there is no common, interoperable use above the level of 1’s and 0’s provided for. My binary document means something only to me, and unless I go outside of the standard and share additional information with you, you will not be able to understand my binary document.
Interoperability comes not from representation, but from specification.
(An aside — There is however speculation that it is possible to transmit information via a binary code in a way that presupposes no other prior agreement or knowledge other than universals like mathematical and physical laws. It would require a bootstrapping approach where very basic elements of notation and mathematical logic are transmitted, followed by increasingly more complex concepts. By this theory it would be possible to communicate with alien intelligences without any prior conventions. See, for example, Carl Sagan’s novel, Contact. But this is probably overkill for an office document format, unless your workplace is a lot stranger than mine.)
So what is the difference between representing and specifying? When you represent, it means that you can map from the features of the legacy format to the the new format. When you specify, it means that you provide the map, and enough detail so that others can read and write that same representation. That is a big difference.
Of course, OOXML is more than 1’s and 0’s. But when you see attributes with names like, “useWord97LineBreakRules,” with no additional specification, then you know that the fix is in. My guess is that MS Word has code someplace that looks like this:
if (useWord97LineBreakRules)
doCrappyOldWayOfLineBreaking(); // reuse legacy code from Word 97
else
doNewWayOfLineBreaking(); // Use new rules
If this is true, then MS Word can implement this feature trivially. But no one else can make sense of it, because we lack a specification of its behavior . They might has well had called the attribute, “Fred.” It is just as useful.
Another example is how OOXML deals with PowerPoint slide transitions, the things that people use in an attempt to make a boring presentation seem more interesting. Microsoft has ensured that they can represent all of the transitions. They are all there listed in Section 4.4.1.46: blinds, checker, circle, comb, cover, cut, etc. But when you drill down into the definitions, this is what you find:
wheel (Wheel Slide Transition)
This element describes a wheel slide transition effect.
[Example: Consider we have a slide with a wheel slide transition. The <wheel> element should be used as follows:
<p:transition>
<p:wheel/>
</p:transition>
End example]
That’s it. Ditto for all of the other slide transitions. Not exactly specified fully, is it? Although the text claims that it “describes a wheel slide transition effect,” in truth it merely labels it. There is no specification, only representation. And that curious little example — is this some sort of joke? Did someone really think that attributes with no definition are improved by trivial examples? It reminds me of the old spelling bee joke:
Judge: The word is “synecdoche.”
Student: Could you use that in a sentence?
Judge: Certainly. “Synecdoche” is a very hard word to spell.
100% correct, but also 100% useless. As I read through the OOXML specification I am finding hundreds of places like this where things are labeled, but no definition is given.
So I think we need to ask more questions when we hear the claims that OOXML was designed to faithfully represent 100% of the legacy documents. We need to respond that representation is not enough for an open format. Even an XML format of just <one>’s and <zero>’s can do that. To be of use to anyone other than Microsoft we need more than just representation. We need specification, and we need the map to the legacy formats. To accept anything else is to embark on a voyage with a foreign dictionary missing the definitions. It can represent everything that you want to say, but you’ll be unable to say any of it.
ISO defines a standard as a:
…document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context
A key clause there is the requirement for providing, “common and repeated use.” Providing explicit representation for a single vendor’s legacy formats while not providing for common use of that ability, this is not the purpose of an ISO standard and to my eyes appears to be an abuse of the standardization process.







