Here begins the lesson on Embrace, Extend and Extinguish (EEE). Classically, this technique is used to perpetuate vendor lock-in by introducing small incompatibilities into a standard interface, in order to prevent effective interoperability, or (shudder) even substitutability of competing products based on that interface. This EEE strategy has worked well so far for Microsoft, with the web browser, with Java, with Kerberos, etc. It is interesting to note that this technique can work equally well with Microsoft’s own standards, like OOXML.
An easy way to find these extension points is to search the OOXML specification for “application-defined” or “implementation-defined”. You will find dozens of them, such as:
- In general, scripting
- In general, macros
- In general, DRM
- Part 1 — “Application-Defined File Properties Part” which is totally undefined, but is referenced 13 times for specific fields in Part 4.
- Section 2.16.4.1 — implementation-defined date/time formatting
- Section 2.16.5.34 — implementation-defined document filters
- Section 3.17.2.6 — implementation-defined string–>number conversions in a spreadsheet
- Section 2.8.2.2 — character sets supported by a font
- Section 2.9.6 — the interpretation of the mysterious hex “template code” in numbered list overrides — “The method by which this value is interpreted shall be application-defined.”
- Section 2.14.27 — application-defined storage of exclusion data for a mail merge
- Section 2.15.1.28 — application-defined cryptographic hash algorithms
- 2.15.1.76 — “Specifies a string identifier which may be used to locate the XSL transform to be applied. The semantics of this attribute are not defined by this Office Open XML Standard – applications may use this information in any application-defined manner to resolve the location of the XSL transform to apply.”
- Section 5.6.2.12 — application-defined macro string reference for connection shape
- Section 5.6.2.15 — application-defined macro string reference for graphic frame
- Section 5.6.2.24 — application-defined macro string reference for a picture object
- Section 5.6.2.28 — application-defined macro string reference for a shape
- Section 5.8.2.9 — application-defined macro string reference for a connection shape
- Section 5.8.2.12 — application-defined macro string reference for a graphic frame
- Section 6.2.2.14 — “This element specifies the presence of an ink object. An ink object is a VML object which allows applications to store data for ink annotations in an application-defined format.”
- Section 7.6.2.60 — implementation-defined bibliographic citation formats
- And many, many more.
So, one might ask, what exactly does “implementation-defined”mean? Here is how OOXML defines it and related terms:
behavior, implementation-defined — Unspecified behavior where each implementation documents that behavior, thereby promoting predictability and reproducibility within any given implementation. (This term is sometimes called “application-specific behavior”.)
behavior, locale-specific — Behavior that depends on local conventions of nationality, culture, and language.
behavior, unspecified —Behavior where this Standard imposes no requirements. [Note: To add an extension, an implementer must use the extensibility mechanisms described by this Standard rather than trying to do so by giving meaning to otherwise unspecified behavior. end note]
Note that this is not an entirely novel definition. Anyone who has spent time reading over the C and C++ Programming Language standards, in ANSI or in ISO, will recall a similar set of definitions. For example, these from ISO/IEC 9899:1999 C-Programming Language:
implementation-defined behavior
unspecified behavior where each implementation documents how the choice is madelocale-specific behavior
behavior that depends on local conventions of nationality, culture, and language that each implementation documentsunspecified behavior
behavior where this International Standard provides two or more possibilities and
imposes no further requirements on which is chosen in any instance
So, you can see that OOXML pretty much copies these definitions. However, ISO standards like ISO/IEC 9899:1999 go one step further and make an additional statement in their conformance clause, something that is distinctly missing from OOXML:
“An implementation shall be accompanied by a document that defines all implementation-defined and locale-specific characteristics and all extensions.”
If you poke around you will see that all conformant C compilers indeed do come with a document that defines how their implementation-defined features were implemented. For example, GNU’s gcc compiler comes with this document.
So, by failing to include this in their conformance clause, OOXML’s use of the term “implementation-defined” is toothless. It just means “We don’t want to tell you this information” or “We don’t want to interoperate”. Conformant applications are not required to actually document how they extend the standard. You can look at Microsoft Office 2007 as a prime example. Where is this documentation that explains how Office 2007 implements these “implementation-defined” features? How is interoperability promoted without this?
(This item not discussed at the BRM for lack of time.)
Anonymous says
Implementation-defined stuff goes actually even beyond that.
I’ll just give two entire classes of problems, for a starter. Note that what follows are from a real implementer, it’s beyond the reach of 99.99% of people who have talked about Microsoft Office XML formats.
1) For performance reasons, Microsoft stores in files only what they think there is no way they can avoid to store. In other words, to minimize the size of streams, and the processing time, they simply don’t store a number of elements or attribute values.
That stuff is left for one to discover. Yep, we (implementers) are all back to good ol’ reverse engineering. The ECMA 376 paper is useless. It’s good to have a thing for performance, nothing wrong with that, but they should document what they don’t store since what we are talking about here are hardcoded values in Office 2007.
2) All what’s not in the schemas. For instance, if you write the XML for a bar chart as opposed to a line chart, but forget to write a bardir element beneath, or write a marker element although the chart type does not expect it, this will create a file that cannot open in Office 2007. Both elements are listed in the schemas, but how they are supposed to be combined isn’t. None of that information appears anywhere on a
paper right now. We (implementers) are back to ol’ reverse engineering.
-Stephane Rodriguez
Rob says
@Stephane, That’s a good point. The number of things that are not defined in OOXML far outnumber the things that are formally indicated to be “implementation-defined”.
A standard should not have anything that is casually undefined. Where something is not fully defined it should be explicitly marked as “implementation-defined”, “undefined” or “unspecified”. There are important shades of meaning here.
Anonymous says
I like to mention that, even though I’m implementing it since summer 2006, the speed at which I’m progressing is the same than with BIFF. It speaks volume that what we are talking about is not XML, it’s angle brackets around complex stuff that gets corrupt very easily (impossible to add your own XML attributes even though that’s why people use XML…)
Another way I have to say it is that ECMA 376 should be 600,000 pages if it was a decent reflection of the product, not 6,000. In other words, the only way to interoperate with this stuff is to take a look at the source code.
Microsoft understands it very well :
– that’s why they have added a complex drawing layer among other things, in addition to adding VML even in places where there wasn’t such thing before. The point? Fire and motion technique.
– that’s why only they can migrate files reliably (marketing message is clear). Their sponsored open source projects are an insult to open source in general.
The ISO proposal is a smoke screen. Any person that has worked with ISO standards knows that ECMA 376 is so poor that it should return to draft status.
-Stephane Rodriguez
James says
Are you suggesting OpenOffice.org, Sun, KDE, IBM et al should all add an option to save in OOXML (DIS29500) format that is incompatible with MS Office? And then blame MS for not following the standard? Are you ROFLOL still?
Chris Ward says
It’s reasonably clear that there will be multiple attempts to create and distribute documents in OOXML format.
Some will be created by users of Microsoft Office. These will all be readable just fine by Microsoft Office.
Some will be created professionally for legitimate business purposes; think of a bank, with their client list on an IBM mainframe, intending to do a personalised ‘e-mailshot’. These will probably be readable; someone’s commercial reputation is on the line.
Some will be created by applications written by the kind of people who brought us OpenOffice.org ; mostly academics, for whom ‘software’ is a by-product of the teaching process. These ones will have more of a scattering of compatibility; maybe if Google tries to understand and index them, the indexing service might jam up.
Probably a large number will be created in support of ‘non-legitimate business’. This sort of stuff Internet Storm Center warning about data theft, and it’s here that the lack of a good specification is more of a problem. With an ill-defined spec like OOXML, these documents will be like pouring sand into a gearbox instead of oil; they will cause the wheels of business to grind to a halt.
This may make OOXML self-limiting.
And many will be constructed by amateurs … apprentice software developers, and the like … with varying degrees of fidelity to OOXML. The world should welcome the amateurs, the apprentices, because the future belongs to them. If you want engineers and scientists, you must let them learn.
Personally, I think ISO would be making a mistake to ratify ECMA376 as an International Standard; it’s fine as a Product Specification, but it’s not in a state where you could recommend other market participants to adopt it as a standard for their products. But I don’t get a vote; I’m disenfranchised on this one; all I can do is point out what I think.
And because so much is left as ‘Implementation-defined’, we won’t be able to sanity-check such documents as they enter businesses, as attachments to e-mails, or being downloaded from web sites. It’s likely to cause log-jams, or more-or-less-misleading interpretations of documents.
There are advantages to the simpler, better-defined approach of ISO26300.
“Soapbox off”. I don’t have a wall of money. Others in the industry do. I’d just appeal to them to use it wisely.
orlando says
Do a search of the term “extLst” in OOXML Part 4 Markup Reference:
651 occurrences.
Quoting some of them:
“3.2.10 extLst (Future Feature Data Storage Area) This element defines flexible storage extensions for implementing applications”
[end of subclause, no more information given!!]”
“3.2.7 ext (Extension) Each ext element contains extensions to the standard SpreadsheetML feature set.
Parent Elements: extLst (§3.2.10)
Child Elements: Any element from any namespace
Subclause: n/a
Attributes: uri (URI): A token to identify version and application information for this particular extension. The possible values for this attribute are defined by the XML Schema token datatype.
[end of subclause, no more information given!!!!]”
“5.1.2.1.14 ext (Extension)
This element [of type CT_OfficeArtExtension] specifies an extension that is used for future extensions to the current version of DrawingML. This allows for the specifying of currently unknown elements in the future that will be used for later versions of generating applications.
…
Attributes: uri (Uniform
Resource Identifier): Specifies the URI, or uniform resource identifier that represents the data stored under
this tag. The URI is used to identify the correct ‘server’ that can process the contents of this tag. The possible values for this attribute are defined by the XML Schema token datatype. [end of subclause, what ‘server’????!!!]”
With “standards” like this, who needs caos?
Anonymous says
OLEObject, oleLink, oleObjects are application-defined too.
Anonymous says
From Portugal:
http://people.angulosolido.pt/%7Egustavo/ct173/OOXML-next.pdf
Steven G. Johnson says
@Anonymous:
“3.2.10 extLst (Future Feature Data Storage Area) This element defines flexible storage extensions for implementing applications”
[end of subclause, no more information given!!]
This reminds me of C and C++ data structures, in which you might add extra “padding” fields to a data structure or class to provide space for adding future data (or pointers thereto) without breaking binary compatibility with programs that are expecting the old structure size.
Of course, the idea that you need such “padding” fields in a flexible syntax like XML is laughable, but it is consistent with the perception that OOXML is just a thoughtless dump of MS Offices’ internal binary data structures.
Vexorian says
OOXML: “Let’s make unspecified behavior an open standard.”
PolR says
Stephane Rodriguez says:
“if you write the XML for a bar chart as opposed to a line chart, but forget to write a bardir element beneath, or write a marker element although the chart type does not expect it, this will create a file that cannot open in Office 2007.”
I wonder what that means in terms of the conformance clause. I know the bar is set low, but I think it still requires that conforming applications don’t reject well formed OOXML input. Can it be argued that if the spec does not describe these dependencies, then it should be OK to ignore them? Conformant applications must be able to open such files.
Victor says
I know the bar is set low, but I think it still requires that conforming applications don’t reject well formed OOXML input.
Yup. That means MS Office 2007 can not be used as reference implementation. But it does not matter: ISO does not require any implementation of any standard (AFAIK there are still zero 100%-conformat implementations of C++ standard). Of course the fact that MS will not even try to implement the standard goes without saying: take look on CSS, for gods sake! Microsoft only implemented CSS1 in MS IE 7 – ten years after standard approval! Even if CSS1 only included few small changes from draft submitted by Microsoft to W3C. Do you really think situation with OOXML will be any different?
Microsoft is not really interested in conformance. Never was, never will. The only interest Microsoft has in standards is rubberstamp. Remember POSIX? Windows NT 3.1 (sic!) was sold under promise of almost 100%-conformant implementation (non-POSIX systems were not allowed in government at the time) yet things like hardlinks were only added in Windows 2000. Remeber Java? Remember what riled Sun back then? Right: Microsoft only implemented parts of standard not the whole standard. Even if it was explicitly written in contract!
As far as Microsoft is concerned standards exist to pick and choose not to faithfully implement. And MS-produced standards are worse. Of course there are no 100%-conformant ODF implementations either but there are subtle difference between bugs and deliberate ignorance…
Anonymous says
“I wonder what that means in terms of the conformance clause.”
ECMA 376 makes it clear in its scope that conformance is purely syntactic. As opposed to follow semantics.
It’s a way not to say that the schemas given are inclusive (you’ve got all alphabetically dumped there), but how elements are supposed to be combined isn’t documented.
Due to the sheer size of elements and attributes, it is therefore humanly impossible to reproduce those undocumented combinations in less than years upon years of work.
We are actually very close to where we were with binary files.
Just now third-parties who have done that work already for binary files have to do substantial work supporting the same thing stored differently.
Should Microsoft have worked to improve SVG with their DrawingML instead, I guess many of us would be willing to take a look at it. But the Office document model is just legacy angle-bracketed stuff, not something worth implementing.
-Stephane Rodriguez
PolR says
Stephane Rodriguez says:
“ECMA 376 makes it clear in its scope that conformance is purely syntactic. As opposed to follow semantics.
It’s a way not to say that the schemas given are inclusive (you’ve got all alphabetically dumped there), but how elements are supposed to be combined isn’t documented.”
This was precisely the point I was looking for. If conformance is syntactic, than rejecting a syntactically valid OOXML file is non conformant, even if it is made for valid semantic reasons.
Am I right to believe the conformance clause has legal implications? I can see some of them in procurement. If someone requires ISO or ECMA OOXML, then Microsoft Office 2007 is non conformant because it rejects syntactically valid input for semantic reasons.
On the other hand if an ODF application shrewdly includes a XML parser/editor for OOXML that can read and write valid OOXML, then the ODF application is OOXML conformant even if it can’t properly use the file as an office document. If the RFP doesn’t explicitly link the requested application features to OOXML beyond saying the OOXML standard must be supported, then the ODF application may legally meet all the requirements of the RFP while Microsoft Office doesn’t. :)
Imagine a government requiring ISO OOXML and selecting Microsoft Office 2007 over the conformant ODF proposal. If the ODF supplier wants to play tough, there is room for lawyers to have fun challenging the outcome of poorly written RFPs.
Of course Microsoft can “fix” the issue by adding an XML parser/editor functionality for semantically invalid OOXML instead of rejecting the file. This would only underline that the regular Office 2007 applications are not conformant. And RFP authors would have to include language in RFPs to mean they want Office, not ODF with silly OOXML add-ons.
This would require the RFP authors to be aware of the OOXML conformance clause. It would require public servants to understand the scope of OOXML is syntax while ODF covers semantics. Just that would be be a significant achievement for ODF adoption.
Please remind me, why is Microsoft doing this charade again?
tommyd3mdi says
Now this is a “good” one from one of MS representatives from New Zealand:
“Why does OOXML not include macros, scripting, OLE serialisation, and leave so much to be application-defined?
Competition between Office Automation suites has always been an important factor in driving much of the innovation that we enjoy in the industry and as users today. The process to standardise OOXML is a process to standardise the data format, not an application. Standardising the full application would remove the ability for different office applications to compete with each other and slow that pace of innovation. […]”
Perfect, there is the vendor-lock-in again! Well guys, just stick with your stupid binary format and don’t pretend your openess. Its just a farce, nothing more.
(source, via boycottnovell.com)
Anonymous says
Another meaning of implementation defined is that this are features that Microsoft will be changing everytime the rest of the world manage to work out how Microsoft currently are doing things.
RTF was promoted as a standard, but did really change on every new update of windows just to reduce interoperability. Any entry in dis29500 that is implementation defined are really blanket approval for Micrsoft to repeat the same stunt again and again with OOXML.