Wednesday, January 31, 2007
More Matter with Less Art
I suggest you read through Miguel's post in its entirely, and then return here for my response.
After an attack against lawyers, we come to some technical comments:
Unlike the XML Schema vs Relax NG discussion where the advantages of one system over the other are very clear, the quality differences between the OOXML and ODF markup are hard to articulate.
The high-level comparisons so far have focused on tiny details (encoding, model used for the XML). There is nothing fundamentally better or worse in those standards like there is between XML Schema and Relax NG.
ODF grew out of OpenOffice.org and is influenced by its internal design. OOXML grew out of Microsoft Office and it is influenced by its internal design. No real surprises there.
Maybe I can be of some assistance here, helping to articulate the difference in quality between ODF and OOXML. ODF, starting from its roots in OpenOffice.org specification, spent a further 2 1/2 years being improved and reviewed in OASIS, then further work preparing for submission to ISO, then a further year in ISO, receiving more comments and corrections, before it was published as an ISO standard. So this is a combined 4 years in technical committees being refined by standards bodies. During this time ODF has been implemented in dozens of applications, including full suites like OpenOffice.org, KOffice and Lotus Workplace, as well as individual applications like AbiWord, Gnumeric and Google Docs and Spreadsheets.
In comparison, OOXML went from a proprietary Microsoft specification to an Ecma standard in record time. If you make something 8 times lengthier than ODF, and do it 4 times faster than ODF, then you are going to have a quality problem. The list of problems on GrokLaw is one list of known problems in OOXML. Note that particular list was generated in only 3 or 4 days by volunteers. I recently did a sampled survey of OOXML specification quality and predicted that it contains thousands of errors.
And where are the OOXML implementations? OOXML was approved by Ecma and submitted to ISO without a single available implementation. Certainly, Office 2007 later shipped with support, but is that it? A single implementation? Until you have at least two independent implementations of a standard you will have a very imperfect understanding of the standard's quality.
So the question to ask is this: Why should JTC1 NB volunteers deal with the mess that Microsoft dropped on their lap by their overhasty review of OOXML in Ecma? Why should they spend the next 6 months reviewing this specification when even a cursory review shows it is defective in so many ways? And considering the observed low level of quality, why should it be reviewed and approved via a Fast Track process, and all in one big chunk of 6,000 pages? Isn't this the last thing you want to do, following up a rushed review in Ecma by a rushed review in ISO? Instead this should go back to Ecma to let them do a proper review, one they can be proud of.
Miguel correctly points out that OOXML derives from Microsoft Office's formats, and ODF derives from OpenOffice.org's formats. But then he leaps to an assertion that they both reflect their parent application's internals. This is not true. Only a poorly-designed file format reflects the internals of the application. Maybe that is how we did it back in the 1980's, but best-practices for portable file formats have been known for years now. That is why we have data formats like XML, so the format can be independent of the application internals. ODF was designed, even in the OpenOffice days, from the ground up to be an application- and platform-neutral document format. While it was further developed in OASIS, it continued to take on such good qualities as reuse of existing relevant W3C standards such as XForms and MathML and SVG. So certainly, the platform-independence and open nature of OpenOffice.org rubbed off on ODF, but isn't that an extremely good thing?
OOXML, on the other hand, matches to an inane degree the internals of a single vendor's legacy application, with no concessions to platform-neutrality. For example, OOXML encodes data in non-XML formats such as binary blobs, bitmasks and other encodings that defy XML schema validation or processing by XML tools. As I've said before, this is not a specification, this is a DNA sequence.
Does that help articulate the difference?
Miguel then takes on the size question:
A common objection to OOXML is that the specification is "too big", that 6,000 pages is a bit too much for a specification and that this would prevent third parties from implementing support for the standard.
Considering that for years we, the open source community, have been trying to extract as much information about protocols and file formats from Microsoft, this is actually a good thing.
This is good thing, I agree, that Microsoft has produced this specification. I'd like even more for them to make the specification for the Office binary formats public, since that is the format that the billions of legacy documents are actually in. I hope you'll join with me in calling for Microsoft to release the specification for these formats under their Open Specification Promise, so that users will truly be able to choose which format they want to remain in or move to.
However, merely because it is useful from a disclosure perspective, does not necessarily mean it will make a good standard. Simply because it is better than nothing does not mean it is sufficient for an ISO standard. There is an important difference between a descriptive specification and a prescriptive standard. Writing down file formats is a small virtue, and one that other companies have done for years. Do they all deserve to be ISO standards?
This is a rather bold misstatement, considering that implementations such as OpenOffice.org, KSpread, Gnumeric, Google Spreadsheets, Lotus Workplace, etc., already in fact exist. Go back even earlier, we had 1-2-3, Quattro Pro and OpenOffice all supporting Excel's formulas even though there was no formal specification for it. Sure having a good specification helps, but the extreme rhetoric that says that this is unimplementable is patently absurd. Just look around.For example, many years ago, when I was working on Gnumeric, one of the issues that we ran into was that the actual descriptions for functions and formulas in Excel was not entirely accurate from the public books you could buy.
OOXML devotes 324 pages of the standard to document the formulas and functions.
....
Depending on how you count, ODF has 4 to 10 pages devoted to it. There is no way you could build a spreadsheet software based on this specification.
Some folks have been using a Wiki to keep track of the issues with OOXML. The motivation for tracking these issues seems to be politically inclined, but it manages to pack some important technical issues.
Hmm... The open source community helps test a purported open standard, reports the defects it finds, and this is called "politically inclined"? Isn't this what open source is all about, "given sufficient eyeballs, all bugs are shallow"? Shouldn't open standards be subject to scrutiny? As I said in my blog, I am so impressed by the quality and productivity of this type of wiki-enabled public review that I am going to investigate how we can do this to solicit public comments on ODF 1.2. This isn't for political reasons. This is because it works.
Some of the objections over OOXML are based around the fact that it does not use existing ISO standards for some of the bits in it. They list 7 ISO standards that OOXML does not use: 8601 dates and times; 639 names and languages; 8632 computer graphics and metafiles; 10118-3 cryptography as well as a handful of W3C standards.
By comparison, ODF only references three ISO standards: Relax NG (OOXML also references this one), 639 (language codes) and 3166 (country codes).
Not only it is demanded that OOXML abide by more standards than ISO's own ODF does, but also that the format used for metafiles from 1999 be used. It seems like it would prevent some nice features developed in the last 8 years for no other reason than "there was a standard for it".
Miguel has inexplicably ommitted all of the W3C standards that ODF uses, such as XForms, MathML, SVG, XLink, SMIL, XSLT, CSS2 as well as IETF standards such as RFC 2045, RFC 2048, RFC 2616, RFC 2898, RFC 3066, RFC 3987. To imply that OOXML follows more standards that ODF is a foolish statement, unsupported by facts.
On the WMF, Miguel has it all wrong. What is a Windows Metafile? It is simply a recording of the graphical function calls made by Windows as it renders a drawing. It maps 1-to-1 into Windows API calls. It maps so closely to Windows that when the WMF format was found to be vulnerable to a security flaw, even the Wine Windows compatibility layer for Linux was susceptible to the same security hole. WMF (and VML, another legacy format in OOXML with a history of security problems) are flawed formats. One security vendor said: "Turns out this is not really a bug, it's just bad design. Design from another era." and "The WMF vulnerability probably affects more computers than any other security vulnerability, ever."
Although Miguel is pleased to note that the proposed cross-platform ISO standard, Computer Graphics Metafile (CGM) dates to 1999, he fails to mention that WMF is even older, dating back to Windows 3.0 (1990).
So which one should be prefered in an ISO standard? The Windows Metafile format which is not documented in an open standard, is tied to the graphical layer of a single vendor, and has design flaws with serious security implications? Is this what we really want? Or do we want an open standard, one designed to be platform neutral, that has been in use for eight years, that has had a community continuing development and promotion of it such as CGM Open and WebCGM? Where is the WMF community? A Google search for WMF comes up with security problems; a search of CGM comes up with communities, initiatives and test suites.
There is an important-sounding "Ecma 376 relies on undisclosed information" section, but it is a weak case: The case is that Windows Metafiles are not specified.
It is weak because the complaint is that Windows Metafiles are not specified. It is certainly not in the standard, but the information is publicly available and is hardly "undisclosed information". I would vote to add the information to the standard.
Did you really read the Groklaw issues list? WMF is not the only, or even the most troublesome of the undisclosed information in OOXML. Start here, then go back and read the Groklaw list of issues, and let me know if it makes more sense then. I am not that good at explaining these things, so please ask questions and I will try harder.
I have obviously not read the entire specification, and am biased towards what I have seen in the spreadsheet angle. But considering that it is impossible to implement a spreadsheet program based on ODF, am convinced that the analysis done by those opposing OOXML is incredibly shallow, the burden is on them to prove that ODF is "enough" to implement from scratch alternative applications.
There is that claim, that it is impossible to implement an ODF spreadsheet. Miguel, surely you aware of OpenOffice, KSpread, Lotus Workplace, Gnumeric, Google Docs? How can you persist in such obvious error? How could you actually write the above when you know, I know, and everyone reading it knows that it is patently false? Please tell me it was a just a typographical error.
Here's a challenge: Give me a list of four spreadsheet applications from four different vendors that today are as interoperable with OOXML as the four leading ODF spreadsheets are with ODF.
There is a good case to be made for OOXML to be further fine-tuned before it becomes an ISO standard. But considering that Office 2007 has shipped, I doubt that any significant changes to the file format would be implemented in the short or medium term.
The best possible outcome in delaying the stamp of approval for OOXML would be to get further clarifications on the standard. Delaying it on the grounds of technical limitations is not going to help much.
This is quite a revealing statement. Why should the shipment of Office 2007 factor in the appropriateness and the quality of a proposed International Standard? Should standards of quality be relaxed for Microsoft's convenience? Do technical limitations not matter because Microsoft has sales targets to meet? Is this what ISO is for? If so, I suggest their hard-working volunteers be given Microsoft salaries and stock options, since clearly they would be working only for Microsoft's benefit at this point.
Miguel has a good point at the end:
To make ODF successful, we need to make OpenOffice.org a better product, and we need to keep improving it. It is very easy to nitpick a standard, specially one that is as big as OOXML. But it is a lot harder to actually improve OpenOffice.org.
If everyone complaining about OOXML was actually hacking on improving OpenOffice.org to make it a technically superior product in every sense we would not have to resort, as a community, to play a political case on weak grounds.
OpenOffice.org is one, but not the only application of ODF. It is the most prominent one in the traditional heavy-weight office suite model, but I'm not certain that this is the only way forward. We need good implementations, several of them, since one size does not fit all.
In any case I'd say in return that if Microsoft and Microsoft boosters spent some of their time investigating exactly how easy it would be to encode Office's legacy features on top of the extensible ODF specification, and worked together with the ODF community to address their common concerns, then we could easily have a single interoperable format that we all could use. The resulting standard of OOXML on top of ODF would be smaller, simpler, higher quality and more interoperable than the mess that we'll end up with by having OOXML as a standard, in addition to ODF.
Change Log:
2/1/2007 — fixed spelling errors reported by a reader via email
2/2/2007 — another spelling error
Tuesday, January 30, 2007
Defining Deviancy Down
...the amount of deviation a community encounters is apt to remain fairly constant over time. To start at the beginning, it is a simple logistic fact that the number of deviancies which come to a community's attention are limited by the kinds of equipment it uses to detect and handle them, and to that extent the rate of deviation found in a community is at least in part a function of the size and complexity of its social control apparatus. A community's capacity for handling deviance, let us say, can be roughly estimated by counting its prison cells and hospital beds, its policemen and psychiatrists, its courts and clinics.
In other words, a community's perception of social deviation is conditioned and limited by their capacity for controlling it. With equal number of punishment cells, equal-sized communities of cloistered monks and bloodthirsty pirates would perceive the same rate of deviancy. Of course the actual deviations would be different: Brother Maynard isn't praying earnestly enough versus Greybeard slit a crewmate's throat in the night, without warning the bunkmate below.
The late Senator from New York, Daniel Patrick Moynihan, took this idea and applied it to the social ills that America has increasingly faced since the 1960's: mental illness, illegitimacy and violent crime. How does society react when the level of deviancy rises unexpectedly and rapidly above accepted norms? He observed, in an essay entitled, "Defining Deviancy Down":
[...T]he amount of deviant behavior in American society has increased beyond the levels the community can "afford to recognize" and that, accordingly, we have been re-defining deviancy so as to exempt much conduct previously stigmatized, and also quietly raising the "normal" level in categories where behavior is now abnormal by any earlier standard.
I look at the current situation with Office Open XML (OOXML) in a similar way. There is a clearly defined community — JTC1 member National Bodies — with the responsibility for reviewing submitted standards. However, their capacity for exercising control is finite. The JTC1 Directives allow them a fixed period of time to review any submission. They also have a fixed number of volunteers to perform the review, and a fixed (or at least highly constrained) number of meetings to discuss and agree on review comments. So, when presented with a specification of unprecedented length (over 6,000 pages), and rather low quality, what are they to do? Spend hundreds of hours reading the specification? Write up and report thousands of errors? No, the capacity in JTC1 to deal with this level of deviancy does not exist, so the natural way for the community to cope is to to define deviancy down.
How deviant is OOXML? The 6,000+ page length is one aspect. Another is the rate at which it raced through its Ecma review, 20-times the speed of comparable specifications. Certainly, a longer specification will tend to have more problems than a shorter one, and a rushed review will find fewer problems than a thorough one. But that is speaking in generalities. Is there anything we can say for OOXML defect rates?
The Groklaw review, which occurred over a few days found a large number of serious problems. But I think we can quantify this a bit more. I tried an experiment. I used a random-number generator to generate a sample of 20 page numbers in the OOXML specification. I then read each of these pages, looking for technical errors, platform dependencies, lack of extensibility, drafting errors, etc. I did not bother noting spelling, grammatical or usage errors. I recorded how many reportable errors I found on each page. Some pages had zero problems, others had 1, 2 or even 3 problems. I even found one particularly bad error that could send OOXML back to Ecma once reported — more on that another day — but the average errors per page was 1.0. So projecting out to a 6,039 page specification this leads to a prediction of 6,000 +/- 1,000 errors. Reviewing a larger number of pages would reduce the error bars on that prediction, but we seem to be dealing with defects numbering in the thousands.
Are NB's able to deal with a level of deviancy this great? Do they possibly have the resources to detect and report this number of errors and then verify that they are addressed? If not, the natural reaction is to define deviancy down.
For example, OOXML is currently in a 30-day review period where "contradictions" with existing ISO or IEC standards can be alleged by National Bodies (NB's). Although the word "contradiction" is not defined in JTC1 Directives, its meaning can be seen from a resolution unanimously adopted at a JTC1 Plenary in 2000:
Resolution 27 - Consistency of JTC 1 Products
JTC 1 stresses the strong need for consistency of its products (ISs and TRs) irrespective of the route through which they were developed. Any inconsistency will confuse users of JTC 1 standards and, hence, jeopardize JTC 1's reputation. Therefore, referring to clauses 13.2 (Fast Track) and 18.4.3.2 (PAS) of its Directives, JTC 1 reminds ITTF of its obligation to ascertain that a proposed DIS contains no evident contradiction with other ISO/IEC standards. JTC 1 offers any help to ITTF in such undertaking. However, should an inconsistency be detected at any point in the ratification process, JTC 1 together with ITTF will take immediate action to cure the problem.
The clear meaning of this is that contradictions are to be avoided, and that some of the defining characteristics of standards with contradictions are that they are not consistent, that they confuse users, and that they jeopardize JTC1's reputation.
Further, we have precedents of other contradictions raised within JTC1, such as just last year, when the NB's of the UK and Germany both alleged contradictions against Microsoft's C++/CLI specification, then submitted for Fast Track processing from Ecma. The contradiction raised by the German NB (DIN) in that case said in part:
On a technical level, there are some rather different approaches between C++ and C++/CLI which can easily cause considerable confusion when both languages are considered to be "C++" or add unnecessary overhead when trying to write C++ code usable with C++ and C++/CLI. Below are a few example although if there were sufficient time to to thorough analysis of the C++/CLI document more could probably be found.
This is simple, easy to understand, and well within the spirit of the JTC1 Resolution quoted earlier.
But in a notable case of defining deviancy down, we're starting to see the word "contradiction" defined very narrowly. For example, Microsoft's Brian Jones suggests contradictions should be looked at this way:
[T]his is where you want to make sure that the approval of this ISO spec won't cause another ISO standard to break. In the case of OpenXML, there really can't be a contradiction because it's always possible to implement OpenXML alongside other technologies. For instance, OpenOffice will soon have support for ODF and OpenXML.An example of a contradiction would be if there was a standard for wireless technology that required the use of a certain frequency. If by using that frequency you would interfere with folks using another standard that also leverages that frequency, then there may be a contradiction.
To be quite fair, the Chinese WAPI defeat in ISO is also a precedent, but when searching for a definition of "contradiction" all precedents should be considered, not just one. Arguing exclusively from a wireless protocol standard precedent when dealing with the case of an XML markup standard is dubious when contradictions just last year were alleged to a programming language, a technology much closer to OOXML than a wireless protocol is. Surely, since C++/CLI is Microsoft's technology they would be aware of this precedent? But still they didn't mention it.
I ask you to consider the impact of taking Microsoft's definition of "contradiction" and applying it to virtual technologies, like document formats, image formats, presentation formats, programming languages, operating system interfaces, API's, security protocols, anything in the realm of software rather than hardware. None of these can ever conflict by Microsoft's definition. Never. Therefor there is never grounds for a contradiction, and JTC1's own Directives, which adopted the contradiction clause only a few years ago, is a procedural nullity, a no-op, meaningless, a waste of time for a large part of the technologies JTC1 has standards authority for. This is a clear example of defining deviancy down.
Let's go back in time, 750 years ago to Thomas Aquinas and his Summa Theologica, the 13th century's God: The Missing Manual. Aquinas had some apt words on contradictions, when discussing whether the powers of God were infinite and omnipotent (Question 25, Article 3):
Therefore, everything that does not imply a contradiction in terms, is numbered amongst those possible things, in respect of which God is called omnipotent: whereas whatever implies contradiction does not come within the scope of divine omnipotence, because it cannot have the aspect of possibility... For whatever implies a contradiction cannot be a word, because no intellect can possibly conceive such a thing.
Aquinas here allows that God can do all things that are possible, but cannot do something which is a contradiction in terms. Going back to Microsoft's proposed definition of a contradiction, it seems that they are only willing to acknowledge a contradiction if it amounts to a co-existence problem so severe that even God could not resolve it. This seems to be a rather high hurdle to reach, and is clearly not what JTC1 intended. This is defining deviances down, way down.
This is the essential problem JTC1 has with the OOXML submission. It is too large and has too many problems with it for the control mechanisms available to JTC1 (in particular review time and volunteers) for handling the presented level of deviancy. The only recourse available to them is to define deviancy down to the level where they can handle a much smaller number of problems. Of course, this will lead to a much lower-quality ISO Standard than we are accustomed to, but what other choice is there?
This lesson has clear ramifications for Microsoft. The bigger the specification, the less throughly it will be reviewed. If you make it large enough it will barely be reviewed at all. The plan for 2007 should be to combine the .NET, OPC, XPS, JScript, J#, C#, XAML, WPF, HD Photo and whatever other specifications you have handy, put them all into one 50,000 page document, call it the "Open Microsoft Specification" rush it through Ecma and then Fast Track it into ISO. No one can really stop you. JTC1 Fast Track is broken.
Monday, January 29, 2007
Microsoft on Standards
Let's take a look inside.
First, here is the opening "Evangelism is War" section of a report called Effective Evangelism.
Our mission is to establish Microsoft's platforms as the de facto standards throughout the computer industry. Our enemies are the vendors of platforms that compete with ours: Netscape, Sun, IBM, Oracle, Lotus, etc. The field of battle is the software industry. Success is measured in shipping applications. Every line of code that is written to our standards is a small victory; every line of code that is written to any other standard, is a small defeat. Total victory, for DRG [Developer Relations Group], is the universal adoption of our standards by developers, as this is an important step towards total victory for Microsoft itself: 'A computer on every desk and in every home, running Microsoft software.'
Then we have this email from Bill Gates:
One thing we have got to change is our strategy — allowing Office documents to be rendered very well by other peoples browsers is one of the most destructive things we could do to the company.
We have to stop putting any effort into this and make sure that Office documents very well depends on PROPRIETARY IE capabilities.
Anything else is suicide for our platform. This is a case where Office has to avoid doing something to destroy Windows.
And here is a excerpt from an email from then Microsoft GM Aaron Contorer to Bill Gates:
Switching Costs
In economics there is a well-understood concept called switching costs - how much it costs for a trading partner to change partners. Our philosophy on switching costs is very clear: we want low swiching costs for customers who want to start using our platform, and we want to provide so much unique value that there are in effect high costs of deciding to move to a different platform. There is a name for this: it is called Embrace and Extend.
Embrace means we are compatible with what's out there, so you can switch to our platform without a lot of obstacles and rework. You can switch from someone else's Java compiler to ours; from someone else's web server to ours; etc. Customers love when we do this (as long as we don't spend our energy embracing extra standards no one really cares about); our competitors are not sure they like it because they prefer us to screw up.
Extend means we provide tremendous value that nobody else does, so (A) you really want to switch to our software, and (B) once you try our software you would never want to go back to some inferior junk from our competitors. Customers usually like when we do this, since by definition it's only an extension if it adds value. Competitors hate when we do this, because by adding new value we make our products much harder to clone - this is the difference between innovation and being just a commodity like corn where suppliers compete on price alone. Nobody builds or sustains a business as successful as Microsoft by producing trivial products that are easy to clone - that would be a strategy for failure.
If we fail to embrace, we can lose because there are big barriers to buying our products. But if we fail to extend, or do only humble work that is easy to clone or to surpass, we automatically lose because our competitors will spend literally billions of dollars to clone our work and replace us.
Patrick Ferell, at the time head of MSN tools and applications, worried about the internet's open standards and protocols:
Looking out from the inside the current MSN strategy some things that concern me about the Internet and the Web are:
1) The Internet is about as open as it gets. This means that an ISV can go and buy a C compiler and a server, rent a wire and create a new service or create an extension to an existing one. The tools are still a little crude but there are very few bottlenecks in this process.
2) The Internet defines formats and architectures that MS has no control over and very little say in. MIME and the WWW helper architectures are crude but quite extensible.
Are there any other good Microsoft quotes out there regarding formats or standards? Post as a comment and I'll add the best ones to the main post.
Change Log:
02/11/2007 — added Embrace & Extend quote sent in from reader
02/14/2007 — note on the links to the exhibits being broken
02/03/2008 — added MSN strategy quote
Adobe to Standardize PDF
Note that this is not PDF's first trip to ISO. Subsets of PDF have been standardized for particular problem domains, such as:
- PDF/A for archiving as ISO 19005-1:2005
- PDF/X for digital prepress exchange as ISO 15930
- PDF/E for engineering workflows, currently under review ISO DIS 24517
Saturday, January 27, 2007
A Review of the Wikipedia Article on ODF
In accordance with Wikipedia's Conflict of Interest guidelines, I will put a link to this blog entry on the ODF article's Talk page. These points are for the consideration of the volunteers editing the article, to consider and do what they want with them. I'll probably repeat this review on a quarterly basis.
Since the article is changing at a rather rapid rate, you should note that I looked at the revision of 27 January at 16:19 which you can retrieve here.
- Opening paragraph. "...is a document file format used for exchanging electronic documents". I'd say instead, "...for describing electronic documents". Documents are exchanged via protocols like SMTP, WebDAV or HTTP, etc. ODF is only describing the documents.
- Strictly speaking, ODF was developed by a technical committee (TC) working within the OASIS consortium. The point is OASIS as a whole approved ODF, but it was developed within a TC.
- Last sentence of first paragraph is awkward. I'd keep the details and dates in the Standardization section and just state the current status here: "OpenDocument is an OASIS Standard as well as an International Standard published as ISO/IEC 26300:2006"
- The next sentence is weak. I'd rephrase as something like "ODF meets the common definitions of an [Open Standard], meaning the specification is freely available and may be implemented freely". Since Wikipedia already has nice article on open standards, why not just link to that?
- The claim that ODF was "intended" to avoid vendor lock-in should be substantiated. That indeed may be one of its effects. But the charter of the TC did not mention that as an explicit goal. I think this is just loose language. Whenever you see a passive sentence, ask yourself, "Who or what did this"? Who intended ODF to be such and such? If you can provide a reference for that question, then you have something.
- Next sentence is awkward. How about, "OpenDocument is the first widely adopted International Standard for editable office documents." ?
- Under Specifications, in addition to the listed compression advantage of using the approach with the ZIP archive, it also has the benefit of separating the content, styles , metadata and application settings into four separate XML files. This is a good example of the architectural principle of [Separation of Concerns].
- I suggest we add here: "An important goal during the development of ODF was to reuse existing relevant standards where possible. Such standards used in ODF include [MathML], [Synchronized Multimedia Integration Language|SMIL], [SVG], and [XForms]." If needed a link to the ODF TC's charter would server as an authoritative reference for the goal to reuse existing standards.
- The Standardization section seems to be split off into a linked article which is a bit outdated. Is this necessary? This might make more sense to have this information brought back into the main article. Just my opinion.
- First sentence is not quite correct. ODF was developed by a technical committee (TC) working within the OASIS consortium.
- "OASIS Standard" should be capitalized as a proper noun.
- This section gets a bit weighted down with jargon. Does the average reader, even a technical reader, understand was a "DIS" is, or a "default ballot"? We should either explain the significance of these terms, or summarize. I don't think this needs to contain a day-by-day retelling of how a specification made its way through ISO.
- OpenDocument Format 1.1 was approved as an Committee Specification in October. The ballot for approval as an OASIS Standard is occurring right now. (Would the average reader understand this distinction? Specifications are approved first by the ODF TC as Committee Specifications, then major versions are put forward for a vote by the entire OASIS membership as an OASIS Standard, and even more significant editions are then put forward for approval by ISO as an International Standard.)
- On the ODF 1.2 work, the parenthentical remark on spreadsheet formulas seem out of place and redundent since there is a separate Criticism header that covers this. The obvious presumption is that anything added to ODF 1.2 is added because it is not already there. Do we believe that any reader would think otherwise?
- Overall the 1.2 statement looks like it needs a rewrite. I'd suggest a simple statement like, "OpenDocument Format is currently being drafted by the ODF TC. It is planned to contain additional accessibility features, metadata enhancements, spreadsheet formula definition (based on [OpenFormula] and any errata submitted by the public." (Discussion of various schedule predictions seems outdated since December has already come and gone. )
- Section on Application support -- "Since there are a number of independent implementations of the ODF standard..". This might be better in an "Interoperability" sub-section. If you make such a sub-section, the Fellowships test suite, mentioned earlier in the article, could be moved there as well.
- "Although Microsoft Office does not support OpenDocument..." should be, "Although Microsoft Office does not support OpenDocument natively..."
- Again, never trust engineers to come up with a good prediction of schedules. December has come and gone and no Add-in is complete.
- There should also be mention of Corel's stated plans to add ODF support to WordPerfect Office. The press release you can reference is here.
- There is mention here of a "MS Open XML translator". This was Microsoft's name for their intiative. But the web page linked to here consistently refers to itself as the "ODF Add-in for Microsoft Word". This is confusing. Maybe start with a mention of the Microsoft announcement from July 2006 (this press release) then say that one such project supported by Microsoft is the ODF Add-in for Word, etc.
- The ODMA mention is unrelated to ODF. It probably should be removed entirely.
- Under the Accessibility sub-section, might want to mention that a group at the University of Illinois has written an OpenDocument Format Accessibility Evaluator to scan uploaded ODF documents for how well they follow best practices for accessibility. A link to the tool is project is here.
- Under Promotion section, we should link to the ODF Adoption TC's web page here and mention that they also manage the web site http://OpenDocument.xml.org
- The promotion activities of OpenOffice.org should be included in the bullet list that follows, right? Not clear why it is not.
- "...as well as other companies who may or may not be working inside..." is weird. Was someone attempting to say something here. The fact that the ODF Alliance is stated has having "more than 280 members" should make it obvious that not all are members of the OASIS ODF TC. Is anything added by having this statement?
- ODF Alliance has 362 organizational members according to their latest newsletter here .
- In Adoption section, there is repetition of information that was already covered in the Application support section, such as the Microsoft-funded translator work.
- The Adoption section is incomplete, missing adoptions in Brazil, Argentina, Extremadura Spain, and India. The ODF Alliance newsletters have the details on these and others. This whitepaper is a good summary.
- In Criticism section, the statements, "Some mathematicians do not think that the choice of the MathML W3C standard for use in OpenDocument is a good choice" and "monstrosity written purely by web designers" lack an authoritative citation. All that is given is a link to an unnamed commenter on a GrokLaw article, whose credentials as a mathematician or a spokesman for mathematicians are not obvious. Consider that one of the authors of the MathML 2.0 standard, and co-chair of the W3C's Math Working Group, is Patrick Ion, editor of the American Mathematical Society's Mathematical Reviews. So the credibility of MathML should not so easily be set aside by a single anonymous, unsubstantiated comment. I'd also note that the Wikipedia artcle for MathML does not note such criticism.
- "The OpenDocument ISO specification does not contain a defined formula language" is more precise as "The OpenDocument ISO specification does not define a standard spreadsheet formula language."
- "This means that ISO conforming files do not have to be compatible." This is a weak argument. Even if the spreadsheet language were defined, ISO conforming documents are not required to be compatible. For example, two implementations may implement different subsets of features. And even without a formula standard, implementations can still be compatible. For example, 1-2-3 , Quattro Pro and OpenOffice have been able to read Excel formulas for years, even though Microsoft had not specified this. Maybe what is meant here is "This means that spreadsheet implementations currently rely on application-level interoperability testing rather than referencing a normative specification of formula syntax and semantics."
- The criticism of the ability to embed Java applets is new to me. No reference is given for this criticism. The section number establishes the existence of the feature, but does not establish grounds for criticizing it. Is this original research? If so, it does not belong on Wikipedia.
Change Log
1/28/07 — corrected link to ODF's Talk page
Thursday, January 25, 2007
Crocodile Tears
Microsoft's Doug Mahugh disclosed a portion of the proposal he sent to Rick, in a comment on Slashdot:
Wikipedia has an entry on Open XML that has a lot of slanted language, and we'd like for them to make it more objective but we feel that it would be best if a non-Microsoft person were the source of any corrections… Would you have any interest or availability to do some of this kind of work? Your reputation as a leading voice in the XML community would carry a lot of credibility, so your name came up in a discussion of the Wikipedia situation today.
The national coverage of what was eventually called “Wiki-gate” brought the inevitable reaction from Microsoft — IBM made us do it:
[Microsoft] Spokeswoman Catherine Brooker said she believed the articles were heavily written by people at IBM, which is a big supporter of the open-source standard — in USA Today.
So the question in my mind is this: How bad was the OOXML Wikipedia page before all the fuss started? All this Wiki-gate news hit on the 23rd, with Rick's blog post. So let's go back to the Wikipedia page previous to that, which would be the version of 18 January. Take a read. You can also take a look at the Talk page where the prior version was last edited on 21 September. You can read it here.
Is this something that one would say has “a lot of slanted language” and was “heavily written by people at IBM Corp”? Is this something that warranted extraordinary means to address? I'd be interested in what parts they believed were “heavily written”. What does “heavily written” even mean? This is quite an allegation.
How's this for heavy: I'll make you an offer you can't refuse. I'll do a review and fact checking of the 18 January version of the OOXML Wikipedia entry, and I'll link to the review from the Talk page, so others can consider it and make the changes if they agree. I won't charge you a cent. If you find this at all useful, you can donate a few dollars to the Free Software Foundation. How's that, Doug?
- The first paragraph should say simply “Office Open XML”. It is a waste of time to argue about whether it is Microsoft Office Open XML, Ecma Office Open XML or ISO Office Open XML. At some point you may have one version in ISO while a revision is being worked on in Ecma. Just call it “Office Open XML” and it will cover all cases.
- Next sentence should say, “The specification was developed by Microsoft and others…”. You shouldn't need to list them all here, but do list them under Standardization.
- Should say, “is the default format in Office 2007”, not just “is used".
- “Microsoft maintains that its primary goal…” needs a reference to cite. Perhaps page 1 of the whitepaper.
- “The Microsoft Office Open XML format is Microsoft's direct answer to the OpenDocument format” also needs a reference or should be removed.
- Standardization section would be better if written chronologically, start from the beginning and end at the end.
- Should say, “A liaison from the ISO/IEC JTC1/SC34 was appointed to help during...”
- Licensing — “There has been a lot of argument about…”. If there has been a lot, maybe someone should cite an example?
- Brian Jones is an expert in some things, but I am not aware of his legal credentials. So citing his legal analysis does not seem to be authoritative and I doubt he intended it to be taken that way. This citation should be removed.
- Overall, the Licensing section seems like it is missing what I'd consider the two most important links: Microsoft's Open Specification Promise, and to the Baker & McKenzie analysis.
- I would move the packaging and relations text into its own article called “Open Packaging Conventions” or OPC. The basic structure here will be used in other Microsoft formats like XPS so it makes sense to centralize it in one place and reference it from here.
- Under Document Markup Languages, I'd drop the discussion of the 2003 formats. Move that to a different article if needed. Ditto for DataDiagrammingML.
- Under Criticism, there needs to be some references cited. There is no shortage of criticism and no shortage of references for that. If you want primary source material, I'd suggest GrokLaw list as the most comprehensive. It cannot simply be denied or ignored that there is a large amount of criticism out there. It will look silly for Wikipedia if OOXML is defeated in ISO and the day prior there was not even a mention of criticism on its Wikipedia page.
- Market Adoption — This section seems to be talking more about application support than adoption. I suggest it be renamed “Application Support” and “Adoption” be reserved for notable adoptions of the standard at the state or national level if/when they occur. OpenOffice's support of WordProcessing 2003 doesn't belong here, but Novell's announcement that they will add OOXML should be here.
- A note throughout — this article could use some copy editing. As expected with any text written by several people over time, not all native English speakers, there are differences in levels of formality and a good number of language errors.
Tuesday, January 23, 2007
Linus's Law Applied to Standards Review
This proposition was put to the test this last weekend at GrokLaw, where a team of volunteers attempted to review the 6,000 page Ecma Office Open XML specification. Since the specification is already two-weeks into a 30-day review in ISO/IEC JTC1, a parallel approach was the indicated solution. The alternative, for each individual to review the specification in its entirety, would have required them to read at the rate of 200-pages/day for a month.
The team of around 20 contributors logged nearly 1,000 edits on the wiki they set up for their collaboration. The wiki received a further 4,000 page reads. This was done over a few days, but the bulk of the work was done just this weekend.
What they found is amazing. As you know, I have been reading the OOXML specification, on and off, for a few months now, noting in this blog the problems I've seen. I thought I had a good grasp of the problems. But I was wrong. I was just scratching the surface. The Microsoft guys think I have been complaining too much. But it now looks like I wasn't complaining enough.
Take a look at the report. I'll need a few days to read through the details and research some of the items. You can be sure I'll follow up with some new posts to explain, in plain English, the significance of the new issues.
Also, GrokLaw has put out a call for concerned individuals to write to their nation's JTC1 representatives, to give informed thoughts on whether OOXML should continue the process toward an ISO standard, or whether it should be taken off its current “Fast Track” because it contradicts existing standards. If you are a regular reader of this blog, you know what is at stake and you know what to do.
One final note. I'm so impressed with the results of this collaborative approach to standards review, that I'm going to investigate whether we can do the same thing at OASIS. We've been using a wiki internally for drafting new parts of the ODF 1.2 specification, and that has worked well. But I'd love it if the next time we had a public review period for ODF we could have the public also participate in editing content in the wiki and organize the process that way. It is a much better method than the non-interactive, linear pattern of a mailing list.
Document Format Punditry
So I was a little surprised to receive email a couple of days ago from Microsoft saying they wanted to contract someone independent but friendly (me) for a couple of days to provide more balance on Wikipedia concerning ODF/OOXML. I am hardly the poster boy of Microsoft partisanship! Apparently they are frustrated at the amount of spin from some ODF stakeholders on Wikipedia and blogs.
I think I’ll accept it: FUD enrages me and MS certainly are not hiring me to add any pro-MS FUD, just to correct any errors I see. If anyone sees any examples of incorrect statements on Wikipedia or other similar forums in the next few weeks, please let me know: whether anti-OOXML or anti-ODF. In fact, I already had added some material to Wikipedia several months ago, so it is not something new, so I’ll spend a couple of days mythbusting and adding more information.
This immediately brought on an avalanche of commentary, on his blog, and elsewhere. As someone who also blogs on ODF/OOXML topics, I'd like to say a few words on the subject of document format punditry.
Few of my readers know me personally. They only know me via my words. Their acceptance or non-acceptance of this blog and what I say is largely determined by their perception of these two dimensions:
- Authority — Am I an expert? Am I writing about things that I have direct knowledge of, or through education, training or direct experience would be expected to have worthwhile insights on?
- Orientation — Do I have a bias on the subject being discussed. I'm not using the word “bias” in a pejorative sense, but to describe how far one's views vary from a neutral, journalistic point of view, to a view that is overtly partisan on a particular issue. Bias is expected in opinion pieces, but not in Wikipedia articles.
Looking at the range of people writing on these issues, I see the landscape something like this:

- We have a number of highly informed experts in ODF and OOXML who aren't really talking to each other.
- We have the press, trying to be neutral, but having difficulty figuring out the significance of the technical issues since they are rather esoteric.
- The General Public, who won't even hear about the issues until the press figures it out.
- And then we have various degrees of extremists of all varieties, not easily classifiable. Their writings are backed by ideological more than technical arguments. There are important ideological issues at stake in this debate, so these are voices are important.
Where does Rick fit it into this chart? His expertise is undeniable. But if he takes Microsoft's money he risks losing his reputation for neutrality. That is his choice and I am in no position to fault someone for that. He joins a crowded field of opinionated people already writing on this issue from one angle or another. He'll likely be one of the better pro-OOXML writers out there. Nothing wrong with that. As Charles McCabe famously said, “Any clod can have the facts, having opinions is an art.”
But I do suggest that Microsoft's money would have been better spent, and Rick's skills better used, if they had engaged Rick earlier to help review and improve the OOXML specification. Trying to fix perceptions of the standard after the fact will be a lot harder, and more expensive, than creating a good standard in the first place.
And I will lament the fact that we continue to lack neutral experts who can digest the massive amounts of technical information out there and present it in a way that the press can reference and the public can understand. I think Rick would have served this role admirably. Instead we risk having one less voice in the middle.
Looking at this potential deal with Rick, and Microsoft's earlier deal with Novell, I wonder if someone at Microsoft thinks that neutrality is dangerous and that their purposes are better served by eliminating it?
Monday, January 22, 2007
The Parable of the Solipsistic Standard
Og mil ten fit ghust lech fer ti nostu, pertents? Sperandomiseria, cuic cuic danto do quant fer nos protoblian, sed nuic, volte torma. Zherantilli, fer muc opsice inito brandu s'deko prot affti? Nek worchi fer ubir! Sperandomiseria, gher-kloj ven ter moido, ven ter zer-moidi, eggen ven ter moidisti miki-moiki.
Do you agree? I think this is a good argument and I see no practical downside. Something must be done soon, lest we experience a repeat next time.
Sorry, What is that? You have no idea what I am talking about? Oh. So you don't speak Weirish? We'll need to do something about that then. That's what I'm speaking now, Ecma Weirish. See, I used to use English, but I found that the English language was missing words for some things I wanted to express, so I made up some new words for these ideas, to ensure that everyone would perfectly understand what I was saying, with no ambiguities.
Ini hag danto do abergi nec palmu, ven fec tolibissi, pert rami fer cuic cuic affti.
Pardon, you are still having problems? You want to know about the words in the English language that were already well-known, useful and descriptive, and why I didn't just use those, and supplement them with new words as needed? Good question. Once I started making up new words, I found that none of existing words in English perfectly matched my usage of them. In fact I really couldn't translate my thoughts perfectly into any existing language. My thoughts are so unique that no other language works well for them . A totally new language is a much more accurate way to notate my thoughts. I wonder why everyone doesn't do it? If you use this language, you will understand me perfectly.
Og mil ven ter moidisti… What? You again? Why can't you just speak Weirish? When you use English you just slow down my mental processing. Ah, so you want to know how to speak Weirish. Great. I'll give you a starter word list:
- Pertentare (v) — to walk like Rob walks.
- Protoblia (n) — a nice person [Note: This cannot be fully defined within this word list. It is best defined by how Weir thought a nice person was back 15 years ago.]
- Zherantillo (n) — where Rob keeps his keys, sometimes upstairs near the bedroom, sometimes by the front door, sometimes in a hidden place.
Rhodantillu, muc muc dilinorpthu, ac…
I'm a patient man. What else do you want to know? Why should Weirish be an International Standard? Because it matches my thoughts so perfectly. Everyone wants to know what I think, so it is good that they learn Weirish for that task. If you look closely, you see that there are hundreds of languages already out there. I should have one too.
How do you say, “Firefox” in Weirish? Umm… uhhh… well, you don't. I only use Internet Explorer, so there is no word for “Firefox”. Just say “Internet Explorer 4.0” instead. That's close enough, right? Ditto for “Linux”, “OpenOffice”, “KOffice”, “WordPerfect” or “MySQL”. Here's a 6,000 page document on Weirish I dictated in my sleep last week. Don't leave! Hey! I've given you everything you've asked for. A perfect language, a dictionary for understanding it, a very very long manuscript on it, everything. Please, don't go! Amitambo n'itorno!
Change log
1/28/07 — Fixed broken link, put Weirish text in italics, fixed grammatical error in one of the Weirish passages.
Sunday, January 21, 2007
Opportunity Knocks
Walt Hucks and Opportunity Knocks blog has been putting out some nicely researched commentary on the file format debate. His most recent post, "Whose Finances Are On the Line?", looks at what Microsoft is risking if OOXML fails to gain acceptance.
Walt looks at the business angle in "What's Wrong With Choice?", delving into Microsoft's financials and explaining how that is determining Microsoft's behavior around OOXML:
Let's be honest here. According to your latest Form 10-Q, Office is 90% of the revenue of Microsoft Business Division, which is in turn one of the three profitable segments in the company. Both of the other two segments related directly to the Windows operating systems ("Client" & "Server"). MBD is able to charge a pretty high price for its products. If there was a fully-level playing field—a standardized file format for the industry that almost anyone could implement—that would directly threaten Office & MBD. Losing dominance with Office would in turn threaten the Client segment, because users would be free to utilize whatever operating system(s) met their needs without being risking being unable to share office documents with others.
So, I'd like to officially welcome Walt to the Fraternity of Geeks who Blog about File Formats on the Weekend (FGBFFW), and recommend him to everyone else who will read his blog on Monday.
Saturday, January 20, 2007
Amusing but Confusing
"Open" is an adjective, and in English adjectives are usually placed before nouns, not in the middle of a noun phrase. We say, a "black guard dog", not a "guard black dog". When you fight language, language usually ends up winning. So it is not surprising that what comes out is "Open Office XML" by mistake.
I'm obviously not the only one with this problem. A quick Google for "Microsoft Open Office XML", or "Ecma Open Office XML", phrases that should get zero hits, reveals instead an embarrassment of riches. Everyone gets this wrong.
ZDNet's David Berlind:
Yesterday, when Novell announced that one of the first fruits to be born out of its newly minted legal relationship with Microsoft would be a plug-in to OpenOffice.org that would allow the open source based office suite to open or save documents in Microsoft's Open Office XML (OO-XML) file format, I had a tough time parsing through the text of the company's press release.
Redmonks's Stephen O'Grady with an article titled "Microsoft Open Office XML Formats / Open Document Format Follow Up".
CRN: Reseller Channel News with a headline, "Ecma says Yeah to Microsoft Open Office XML".
Computer Business Review:
Corel Corp, developer of the WordPerfect suite, announced last week that it will support both ODF and Microsoft's Open Office XML format.
XMLMind, a tool designed to work with OOXML gets it wrong:
Thanks to new XMLmind FO Converter v4, it is now possible to convert XML documents to Open Office XML (.docx) the native format of MS-Word 2007.
BusinessWeek proof-readers missed this error:
...Microsoft is working hard to defeat it and promote its own XML-based file format--called Microsoft Open Office XML. This will be the default file format in Office 2007, due out late this year.
Even Microsoft Press Releases make this error:
'Through the XXX Alliance, we are working closely with Microsoft to increase data access across our instrument systems and data analysis software tools using Ecma Open Office XML,' said XXX, president of XXX.
Even Microsoft's blog profile for a member of their own Corporate Standards Team, an OOXML expert, gets it wrong:
Dave is a member of Microsoft’s Corporate Standards policy team. He is involved with all of Microsoft’s global standards around server & tools which includes everything from XML to WS-*, from W3C to Oasis and ISO, all Office standards including Open Office XML, and all vertical industry standards from the enterprise markets to Microsoft Dynamics products
This guy works on Office Open XML and he doesn't even get it right!?
Microsoft's own OOXML overview page on the file formats can't get it right:
By installing a simple update, users of Microsoft Office 2000, Microsoft Office XP, and Office 2003 Editions can open, edit, and save documents in one of the Ecma Open Office XML File Formats.
Ditto for Microsoft's FAQ page on the file formats:
The Ecma Open Office XML Formats will offer some key improvements over the binary file formats in use today within Word, Excel, and PowerPoint. Because these new file formats are compressed, the resulting document sizes will be much smaller, somewhere between 50 and 75 percent smaller in some cases.
A recent article by Microsoft's Platform Strategy Manager in Australia got it wrong in the title: Streamlining your documents with Open Office XML.
And to top it all off, Bill Gates himself gets it wrong, then corrects himself, as seen in Molly Holzschlag's transcript from a recent blogger outreach event she attended at Microsoft headquarters in Redmond:
But every year for 13, 14 years now we’ve not just followed and implemented standards, we’ve contributed. This WS stuff, . . . we contributed more Web standards than anyone! We have our smartest people who go and work on that stuff . . . we just did the OpenOffice . . . our office XML formats we contributed to them . . . we’ve got XML at the core of all our products.
(Thanks to Yoon Kit from Open Malaysia, who has also been taking a closer look at the names used inside OOXML, for pointing out that quote.)
I'm not meaning to embarrass anyone with the above quotes. Those who have heard me speak on Office Open XML know that I struggle to get that name out every time, and do not always succeed. Like I said before, if you fight language, you will lose.
So the Ecma standard clearly has a name which causes confusion with the name of an existing application, "Open Office", which happens to also be the most prominent implementation of OpenDocument Format, the ISO standard for office documents. OpenOffice.org is a registered trademark (check the Tess database for the actual registration) and has been used in the trade since 2001 for describing a application used for database management, spreadsheet, word processor and presentation graphics.
I am not a lawyer, but from reading a BitLaw writeup on trademark infringement, it appears that the thing to prove is "likelihood of confusion", and the factors the courts would look at include evidence of actual confusion by consumers and similarity of the marketing channels for the two products.
In any case, to have an ISO standard that, by its aberrant use of the English language, almost compels users to transform it into "Open Office XML" will only confuse users. This is not just my prediction. It is my observation, backed up by many specific examples of how this confusion is happening even now. I invite you to comment on other examples you may know of.
Early last year, another Microsoft/Ecma was submitted to JTC1 for approval under Fast Track rules. It was Microsoft's C++/CLI specification. During the 30-day contradiction review period national bodies raised objections based on the confusing name Microsoft picked for their standard, and the practical problems this caused. GrokLaw had good coverage of this.
A summary of the UK's contradiction argument is:
In response to document ISO/IEC JTC1 N8037, the UK objects to Fast Track Ballot ECMA-372 1st Edition C++/CLI Language Specification, on the grounds that there is a contradiction with an existing JTC1 standard. ISO/IEC 14882:2003 is the standard for the C++ programming language. Adopting a second standard under the proposed name of C++/CLI will cause unnecessary and harmful confusion in the marketplace.
We consider that C++/CLI is a new language with idioms and usage distinct from C++. Confusion between C++ and C++/CLI is already occurring and is damaging to both vendors and consumers.
A new language needs a new name. We therefore request that Ecma withdraw this document from fast-track voting and if they must re-submit it, do so under a name which will not conflict with Standard C++.
Germany had similar objections:
We propose that the document is input into SC22 as a regular New Work Item Proposal and assigned to WG21 for further processing.
On a technical level, there are some rather different approaches between C++ and C++/CLI which can easily cause considerable confusion when both languages are considered to be "C++" or add unnecessary overhead when trying to write C++ code usable with C++ and C++/CLI.
I suggest a similar objection should be raised with regards to Ecma Office Open XML. It's name causes confusion with an existing registered trademark. Ecma should rename their standard to something less likely to cause confusion.
Any suggestions for a new name?
Updated on 25 June 2007 to add some additional recent examples of this continuing confusion.
The Vast Blue-Wing Conspiracy
Yikes, we've been found out!
The truth can now be told. We have a nine-floor complex beneath Devil's Tower in Wyoming, Dick Cheney's home state. We employee three-hundred Oompa Lumpas, ostensibly here on student visas, to read through the 6,000 page OOXML specification. They then input their concerns into a massively parallel computer, based on the old Deep Blue chess computer that beat Gary Kasparov. The computer takes the objections, formats them into English, inserting random literary quotes from The Modern Library of the World's Best Books, and then posts them in blogs and press articles. The computer can express these objections in the form of sonnets, haikus, or even as crude limerick. Every year on January 14th (Thomas J. Watson's Birthday) at 3:14am the Oompa Lumpas come to the surface, smear their bodies with blue paint, dance around a bonfire, howl at the moon and entreat the gods to vanquish their foes, mainly Microsoft, who canceled their favorite application, Microsoft Bob. Rob Weir doesn't really exist. He is just a subroutine. As they say, "On the internet, nobody knows your are a subroutine processing data input by Oompa-Loompas working for IBM underground in Wyoming"
I guess that's one theory.
But from what I've seen of the world, when you think everyone is out to get you, it is usually one of three things:
- You are mentally ill
- You are doing something stupid and people are trying to help you
- You are in a movie
Then there's the PR angle. In Microsoft's case, PR includes trying to look virtuous to the EU courts. Look, Microsoft can say, at how we play nice with competing platforms like Novell's SUSE. Here's a tin-foil-hat theory: Microsoft can't compete against a movement, Ballmer has acknowledged. It can definitely compete against a company. So isn't it likely that this question has come up at Microsoft: Can't we somehow turn this Linux movement into a company that we can compete with?
Can the same be said about file formats? It is hard for Microsoft to beat a movement, so it attempts to turn this into a battle against a single company.
Let's look at the facts:
ODF is not controlled or promoted by a single company. ODF is developed in OASIS with a Technical Committee (TC) that includes members from a number of vendors, including Adobe, Novell, Intel, Sun and IBM. The TC also includes unaffiliated individual members, representatives from various open source projects, as well as members from the OpenDocument Foundation and other non-profit organizations.
The Foundation in particular has brought a huge amount of talent and resources to the development of ODF. Traditionally, standards were developed exclusively by large corporations, and individuals and smaller players were marginalized. But the world is different today. The Foundation has shown that with a bit of organizational skill, individual volunteers can band together and have a voice and technical contribution on par with long-established corporations. They should be given much credit for this.
On the promotion side ODF is promoted by groups including the ODF Adoption TC, the Open Document Format Alliance, the OpenDocument Fellowship and the previously mentioned OpenDocument Foundation. The Adoption TC manages the ODF portal on XML.org and is currently working on various journal articles, whitepapers and responding to CfP's for various conferences and symposia this year. I've lost count of how many companies are members of the ODF Alliance. I stopped counting when it went over 300. If you are not on their mailing list, then you should be. The Fellowship has also done amazing work promoting ODF and developer tools related to ODF.
So let's put to bed the conspiracy theories that this is all just IBM out to get Microsoft. ODF is far more than one company. IBM does not own ODF or control ODF or control the groups that promote ODF. Those who say otherwise discredit the efforts of the many of volunteers who have worked so hard to develop the ODF standard and implement it in so many applications.
Thursday, January 18, 2007
A Foolish Inconsistency
A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. With consistency a great soul has simply nothing to do. He may as well concern himself with his shadow on the wall. Speak what you think now in hard words, and tomorrow speak what tomorrow thinks in hard words again, though it contradict every thing you said to-day. 'Ah, so you shall be sure to be misunderstood.' Is it so bad, then, to be misunderstood? Pythagoras was misunderstood, and Socrates, and Jesus, and Luther, and Copernicus, and Galileo, and Newton, and every pure and wise spirit that ever took flesh. To be great is to be misunderstood.
These are fine words for a philosopher, but based on those statements I have my doubts as to whether Emerson would have made a good engineer or businessman.
Where systems are designed for multiple parties to collaborate we must have consistency driven by shared standards.
The first shared standards date back thousands of years and supported mankind's earliest commercial ventures:
- Uniform weights and measures, so you knew you were getting what you paid for
- Official coinage of specified weight and purity, so you knew what was being paid
- A working language for recording treaties and trade agreements
As civilization progressed, standards took an increasingly larger role. As the railroad, the steamship and the telegraph shrunk the size of nations and oceans, the speed of communications and commerce increased, leading to such diverse standards as railroad gauges, time zones and international postage. Moving into the information age, the increased speed and variety of communication lead to standardized network protocols, media formats and character encodings.
Generally, standards are necessary whenever two or more parties communicate or exchange goods or services.
A look at the US/Chinese Standards Portal, a joint effort between ANSI and SAC, shows the breadth of standards that specify the properties of materials and products we all use every day. Their tag line is, "The international language of commerce is standards". I concur.
But progress has been uneven. Although I can send an email message anywhere in the world, make a phone call anywhere in the world, send a letter anywhere in the world, and expect that it will be received and read exactly as I intended, formatted documents, spreadsheets and presentations have lacked this level of interoperability. One person uses Word, another person uses WordPerfect, another person uses AbiWord or OpenOffice or WordPro. Older documents might still be in WordStar, XYWrite or Manuscript format. We tried conversions, importing and exporting to various formats for interchange, like RTF and CSV. It worked sometimes, but not always and certainly not well.
How did we get to such chaos in the area of document formats?
It is notable that these applications were designed and their formats defined before widespread commercial use of the Internet. The business user of a word processor circa 1994 shared documents via hard copies, or electronically with only users on their LAN. The facilities for electronic document sharing between business partners, between a company and their customers, or a government and its citizens were not widespread. So company A might use WordPerfect, and company B might use WordStar, but since they didn't exchange documents, or only did so via hard copy, there was no file format problem.
With the popularization of the Word Wide Web and increased connectivity of businesses to the Internet, another jump forward in the rate of communications took place, comparable to the railroad or the telegraph. The world was now a very small place indeed. This lead to a parallel acceleration of the rate of commerce, as new opportunities arose for supply chain integration, advertising, education, online exchanges, outsourcing, and the new business models that are invented every day.
Today, the document you create can instantly be transported around the world. You may not know who reads your document, what operating system they are running or what applications they are using. They may be running Ubuntu on a laptop on the beach, or a Symbian-enabled mobile phone i rush hour traffic, or even using a screen reader or other assistive technology to render the document according to their needs. We no longer exclusively buy, sell or support the person in the bricks and mortar office down the street. Commerce is global, it is instant, and it is based on standards.
This is where OpenDocument Format (ODF) comes in. After a 15 years of chaos in office document formats, it was time for a standard. The rate of communications and commerce demands it. More importantly, customers demand it.
The complaints I hear about the prior state of affairs revolve around these issues:
- I want to own my data.
- I do not want access to my data controlled by a single commercial entity.
- I do not want to require that people go out and purchase a particular application in order to read my documents.
- I want my documents to be in a format that has long-term stability and understandability
- I want my documents to be in a format that lends itself to processing by a range of tools, both commercial and free.
- I want my documents to be a format that everyone can understand.
- I want to break out of the cycle of having to constantly upgrade my software every time my vendor decides to change formats on me
But not everyone is happy with progress. This has always been true. The last Pony Express rider likely cursed at the mere mention of the telegraph. The last DECnet engineer likely mumbled, "Why would anyone want a TCP/IP?" as he packed his belongings and cleaned out his office. And in the realm of document formats, Microsoft is kicking and screaming to try to delay the inevitable widespread adoption of ODF as a document format for everyone.
Why is Microsoft so upset?
The answer is, they enjoy a monopoly in office applications and they know that if users could easily move away from Microsoft Office while preserving access to their documents, then users would leave by the millions. The Fear, Uncertainty and Doubt (FUD) around file formats and fidelity and compatibility is the way Microsoft ensures their lock-in.
Let's review some history of Microsoft and their office file formats, to get a better sense of how this game is played.
Let's go back to the early days, the mid 1990's, when Microsoft did not have such market dominance, back when they had competition in the word processor and spreadsheet market. At that time Microsoft actually documented their file formats. Sure, the specification was incomplete, but it was an honest attempt. You could buy the Excel format in book form from Microsoft Press, or get an electronic version of the Excel and Word formats on an MSDN CD. At one point it was a free download from the MSDN web site.
But around 1999 something happened. The license on the file format specification changed. Where before you could do anything you wanted with the formats, now the specification carried the explicit restriction (my emphasis):
[Y]ou may use documentation identified in the MSDN Library portion of the SOFTWARE PRODUCT as the file format specification for Microsoft Word, Microsoft Excel, Microsoft Access, and/or Microsoft PowerPoint ("File Format Documentation") solely in connection with your development of software product(s) that operate in conjunction with Windows or Windows NT that are not general purpose word processing, spreadsheet, or database management software products or an integrated work or product suite whose components include one or more general purpose word processing, spreadsheet, or database management software products.
So, file format documentation that was once freely available was restricted to applications that ran on Windows and which did not complete with Microsoft Office.
Soon after this file format information was removed from MSDN altogether. It was only available under a licensing program that had even further restrictions:
This program entitles qualified software developers to license the Microsoft .doc, .xls, or .ppt file format documentation for use in the development of commercial software products and solutions that support the .doc, .xls, or .ppt file formats from Microsoft and to complement Microsoft Office
(How should we parse this? What does it mean to "complement" Microsoft Office? I think in ordinary use, an application that competes against Office would not be considered complementary.)
So what happened between 1995 and 2004 to cause Microsoft to wipe out every bit of publicly-available documentation on their file formats? It seems to me that the main change in that time frame was that they wiped out the competition. The earlier availability of the file format documentation seems to have been in order to encourage developers and partners and those days, Excel was good about documenting their file format, and importing and exporting competing formats like 1-2-3.
Joel Spolsky, talking about what was required for Excel to reach its "tipping point" in adoption, explains it this way:
The mature approach to strategy is not to try to force things on potential customers. If somebody isn't even your customer yet, trying to lock them in just isn't a good idea. When you have 100% market share, come talk to me about lock-in. Until then, if you try to lock them in now, it's too early, and if any customer catches you in the act, you'll just wind up locking them out. Nobody wants to switch to a product that is going to eliminate their freedom in the future.
But we see that, as their monopoly was achieved, Microsoft throttled the availability of the Office file format specifications until they was no longer available to potential competitors. The lock-in has been achieved; the door slams shut.
This shows the strategic value of file formats to Microsoft and the steps they have been willing to take in order to keep users locked onto the Windows/Office platform.
So now, today, Microsoft is pushing their Office Open XML standard, "old wine in new wine skins", not so much a new format as a new ploy. What should enrage every thoughtful person is that they are using compatibility with the legacy binary formats as the main selling point of the OOXML format. Think about it. Compatibility with the binary format that they withdrew from the public seven years ago when they cemented their monopoly, is now being touted as their unique advantage. Said differently, Microsoft is selling OOXML as the solution to an interoperability problem that they themselves created and carefully orchestrated.
I'm obviously not a fan, as regular reads of this page already know.
So what prevents Microsoft from doing the same thing again? How do we know that the next version of Office will use a format that is an open standard? Office 2007 has already extended OOXML in undocumented ways to support things like macros and DRM. Although they cannot withdraw the OOXML specification from Ecma, they can surely just ignore it, not update it, and continue to extend their format in undocumented ways. Since the success of ODF is the only reason they are pushing OOXML, it would be in true character for them to deemphasize standard OOXML as soon as ODF is wiped out, and turn it back into an in-house proprietary format, only disclosed to those who agree not to compete with them.
The time is right for a single document standard and that standard is clearly ODF. The opportunity is here for ISO/IEC JTC1 to send a resounding message in favor of interoperability and consistency and to reject OOXML as contradicting the existin