Friday, June 26, 2009
ODF Plugfest

Although the term may be alien to some, "plugfests" have been around for around 20 years. A plugfest is when implementors of the same interface get together and test the interoperability of their products. In the beginning this was done with wired standards, USB, etc. (thus 'plug'). Over the years the term was applied to networking at all higher levels of the protocol stack. The concept is also applicable to document exchange formats like ODF.
We had an ODF Plugfest last week in the Hague. Although we've had interoperability workshops and camps before that attracted a handful of vendors, this was the first one that had nearly universal participation from ODF vendors. I'm not going to recap the details of the plugfest. Others have done that already. But I will share with you some of my conclusions, based on long discussions with other participants, from whose insights I have greatly benefited.
In an ideal world, specifications would be perfect and software applications would be bug-free and users would read the manuals and we would achieve perfect interoperability instantly by anointment of the salubrious unction of standardization. But to the extent this planet's population obdurately persists in imperfection, we are resigned to make additional efforts in pursuit of interoperability. We are not alone in this regard. The only standards that don't need to work on interoperability are those standards that no one implements.
We should use every licit technique at our disposal to give the user the best experience with ODF we can. In a competitive market you can not get away with telling your customer, "Sorry, your spreadsheet doesn't work because page 652, clause 23 says 'should' rather than 'shall'". If you did that you would not have that customer for long. (Unless, of course, you have a monopoly, in which case many seemingly irrational, anti-consumer actions can occur, seemingly without consequences.)
Further, I assert:
- Users want real-world interoperability, and not excuses
- Real-world interoperability is what users see and achieve in practice
- Where vendors have the will to interoperate, achieving interoperability is a known technical problem, with known engineering solutions, but where the will to interoperate is lacking, there are no technical means of compelling interoperability
- Interoperability lies at the intersection of technology, engineering standards, competition law, intellectual property and economics. There are no silver bullets, although there are a arsenal of proven techniques that can help to improve interoperability
- Achieving interoperability is facilitated by a variety of cooperative activities, including standardization, test case creation, implementation testing, online validators, plugfests, defect collection and reporting
- The OASIS ODF Interoperability and Conformance TC, charged with creating an ODF test suite
- The OASIS ODF TC, finishing up work on ODF 1.2
- OfficeShots.org, providing a way to test the interoperability of a document in many ODF editors
- The ODF Toolkit Union, especially their open source ODF Validator
- The Plugfest participants, who continue to add information and test scenarios to the plugfest's wiki.
- Groups such as OpenDoc Society and OpenForum Europe which lend their organizational skills and enthusiasm to the effort, and often much more.
[6/29/09: I've received some emails on the photo, so here are the details:
The picture was taken at 3:30PM on the 2nd day of the workshop.
The lens was a Pentax DA 10-17mm "fisheye" zoom at 10mm. So that explains the projection distortion. The graininess and B&W was from post-processing using Nik Software's Silver Efex Pro and Sharpener Pro.]
Labels: ODF
Tuesday, June 23, 2009
ODF TC timeline
Those who are not familiar with standards development are sometimes amazed at how long it takes to develop a good standard. Perhaps the single-vendor, 6,000 page, 12-month escapade of OOXML in Ecma has skewed expectations. Fortunately, OOXML is the exception, not the rule. Achieving a multi-vendor consensus around a substantial technical standard will always be time-consuming, but it is time that is well spent.
OASIS standards go through several stages of development:
- Working Draft (WD)
- Committee Draft (CD)
- Public Review Draft
- Committee Specification
- OASIS Standard
If you want more detail on the nitty-gritty details, here is a flow chart of the OASIS standards approval process.
I occasionally get a question along the lines of: "What has the ODF TC been doing for the past couple of years?" The following timeline should give you an idea. I've indicated the time spent developing ODF 1.0 and ODF 1.1, along with some other milestone activities, such as the PAS transposition of ISO/IEC 26300, the publication of ODF 1.0 Approved Errata 01 and the creation of the various ODF subcommittees. I've also indicated the dates of each of the ODF 1.2 WD's and CD's.
As you can see, we've been quite busy. After iterating on WD's during 2007 and 2008, we've now moved on to CD's. All of the planned feature work for ODF 1.2 is now completed. The remaining work is to address the various editorial and technical comments that have been submitted to our comment list, as well comments from TC members and JTC1/SC34. The goal is to have no known defects in ODF 1.2 before we send it out for a Public Review. Of course, previously-unknown defects will likely be identified during the Public Review, and we have a process for handling these. I'll comment more on that process, and Public Reviews in general, when we get closer to that stage.

Labels: ODF
Tuesday, June 09, 2009
ODF Lies and Whispers
First up is this instance, from a small Baltic republic, where a rather large US-based software company was recently arguing to the national standards committee for the adoption of OOXML instead of ODF. Here are some of the points made by this large company in a letter:
There is no software that currently implements ODF as approved by the ISO
(They then link to Alex Brown's comment from Wikipedia). I think this demonstrates the triangle-trade relationship among Microsoft, Alex Brown (and other bloggers) and Wikipedia, by which Microsoft FUD is laundered via intermediaries to Wikipedia for later reference as newly minted "facts". No wonder one of Microsoft's first actions during their OOXML push was to seize control of the Wikipedia articles on ODF and OOXML via paid consultants. In any case, Alex's claims were rebutted long ago.
ODF has a number (more than a hundred) of technical flaws which haven't been addressed for 3 years despite change requests addressed to OASIS by countries such as Japan and United Kingdom. There are discussions between OASIS and ISO/IEC JTC 1 SC 34 regarding true ownership of ISO ODF, which is a reason why the flaws in ISO ODF aren't being addressed. In a recent SC 34 meeting in Prague a new ISO ODF maintenance committee has been formed because ISO / IEC 26300: 2006 is not being presently maintained.
This is not true. First, the ODF TC has received zero defect reports from any ISO/IEC national body other than Japan. Second, we responded to the Japanese defect report last November. Amazingly, Alex Brown is implicated in this FUD one as well. It was false then and it is false now. At the exact time Alex was quoted in the press as saying the the ODF TC was not acting on defect reports (October 8th, 2008), we had in fact already sent our response to the defect report out to public review (August 7th, 2008) and then completed that reivew (August 22nd), after quite a bit of active technical discussion with the submitter of the original defect report (Murata Mokoto). How Alex translated that into "Their defect reports are being shelved" and "Oasis has not been acting on reports of defects" is beyond me. It must be particularly embarrassing that Murata-san wrote to the OASIS list, within days of Alex's FUD, "I am happy with the way that the errata has been prepared." How could Alex be ignorant of these facts? Why was he lying to the press? How is this conformant with his leadership role in JTC1/SC34 and his participation in BSI? Also observe the triangle-trade route of FUD in this case from Alex to Doug Mahugh to Wikipedia, this time for negative edits in the OASIS article.
IBM currently recommends not using OASIS ODF 1.1 and to instead use OASIS ODF 1.2 which is currently not complete and will not be complete and ISO certified before 2010/2011. OASIS on the other hand have started work on ODF 2.0 which will not be backwards compatible.
This is an odd one, demonstrably false. IBM Lotus Symphony supports ODF 1.1. We have no ODF 1.2 support at present. I wonder where they came up with this one? It is totally bizarre. Although we have started to gather requirements for "ODF-Next", the contents of that version, and to what degree it will be backwards compatible, has not even been discussed by the TC, let alone determined. So this is pure FUD, trying to make ODF sound risky to adopt, and then lying about IBM's support for it, and our position on ODF 1.2.
The list goes on, including claims that no one supports ODF 1.0 or ODF 1.1, etc., but you get the gist of it. The particulars are interesting, of course, but more so the reckless disregard for the truth, and the triangle-trade relationship between notable bloggers, Wikipedia, and Microsoft's whisper campaign.
Another current example is part of Microsoft's attempt to duck and cover from criticism over their interoperability-busting ODF support in Office 2007 SP2. I've heard variations on the following from three different people in three different countries, including from government officials. So it is getting around. It goes something like this:
We (Microsoft) wanted to be more interoperable with ODF. In fact we submitted 15 proposals to the ODF TC to improve interoperability, but IBM and Sun voted them down.
Nice story, but not true. Certainly Microsoft submitted 15 proposals. But they were never voted on by the TC, because Microsoft chose not to advance them for a vote. They opted not to have these proposals considered for ODF 1.2. It was their choice alone and their decision alone not to put these items up for a vote. I would have been fine with whatever decision Microsoft wanted to make in this situation. I'm not criticizing their decision. I'm just saying we need to be clear that the outcome was entirely due to their decision, and not to blame IBM or Sun for Microsoft's choice in this matter.
I think I can trace this FUD back to a May 13th blog post from Doug Mahugh where he wrote:
We then continued submitting proposed solutions to specific interoperability issues, and by the time proposals for ODF 1.2 were cut off in December, we had submitted 15 proposals for consideration. The TC voted on what to include in version 1.2, and none of the proposals we had submitted made it into ODF 1.2.
This certainly is an interesting statement. There is nothing I can point to that is false here. Everything here is 100% accurate. However, it seems to be reckless in how it neglects the most relevant facts, namely that the proposals did not make it into ODF 1.2 at Microsoft's sole election. It is as if Lee Harvey Oswald had written a note: "Went to Dallas and saw a parade today. Tried to see a movie, but had to leave early. Heard later on the radio that the President was shot". This would have been 100% accurate as well, but not the "whole truth". In any case, the rundown of the facts in this question are on the TC's mailing list.
So what is one to do? You obviously can't trust Wikipedia whatsoever in this area. This is unfortunate, since I am a big fan of Wikipedia. I want it to succeed. But since the day when Microsoft decided they needed to pay people to "improve" the ODF and OOXML articles, these articles have been a cesspool of FUD, spin and outright lies, seemingly manufactured for Microsoft's re-use in their whisper campaign. My advice would be to seek out official information on the standards, from the relevant organizations, like OASIS, the chairs of the relevant committees, etc. Ask the questions in public places and seek a public, on-the-record response. More people are willing to lie than face of consequences of being caught lying. That is the ultimate weakness of lies. They cannot stand the light of public exposure. Sunlight is the best antiseptic.
Labels: ODF
Sunday, May 17, 2009
The Battle for ODF Interoperability
Failed standards don't need to work on interoperability because failed standards are not implemented. Look around you. Where are the OOXML test suites? Where are the OOXML plugfests? Indeed, where are the OOXML implementations and adoptions? Microsoft Office has not implemented ISO/IEC 29500 "Office Open XML", and neither has anyone else. In one of the great ironies, Microsoft's escapades in ISO have left them clutching a handful of dust, while they scramble now to implement ODF correctly. This is reminiscent of their expensive and failed gamble on HD DVD on the XBox, followed eventually by a quick adoption of Blue-ray once it was clear which direction the market was going. That's the way standards wars typically end in markets with strong network effects. They tend to end very quickly, with a single standard winning. Of course, the user wins in that situation as well. This isn't Highlander. This is economic reality. This is how the world works.
Although this may appear messy to an outside observer, our current conversation on ODF interoperability is a good thing, and further proof, to use the words Microsoft's National Technology Director, Stuart McKee, that "ODF has clearly won".
Fixing interoperability defects is the price of success, and we're paying that price now. The rewards will be well worth the cost.
We've come very far in only a few years. First we had to fight for even the idea and acceptance of open standards, in a world dominated by a RAND view of exclusionary standards created in smoke filled rooms, where vendors bargained about how many patents they could load up a standard with. We won that battle. Then we had to fight for ODF, a particular open standard, against a monopolist clinging to its vendor lock-in and control over the world's documents. We won that battle. But our work doesn't end here. We need to continue the fight, to ensure that users of document editors, you and I, get the full interoperability benefits of ODF. Other standards, like HTML, CSS, EcmaScript, etc., all went through this phase. Now it is our turn.
With an open standard, like ODF, I own my document. I choose what application I use to author that document. But when I send that document to you, or post it on my web site, I do so knowing that you have the same right to choose as I had, and you may choose to use a different application and a different platform than I used. That is the power of ODF.
Of course, the standard itself, the ink on the pages, does not accomplish this by itself. A standard is not a holy relic. I cannot take the ODF standard and touch it to your forehead say "Be thou now interoperable!" and have it happen. If a vendor wants to achieve interoperability, they need to read (and interpret) the standard with an eye to interoperability. They need to engage in testing with other implementations. And they need to talk to their users about their interoperability expectations. This is not just engineering. Interoperability is a way of doing business. If you are trying to achieve interoperability by locking yourself in a room with a standard, then you'll have as much luck as trying to procreate while locked in a room with a book on human reproduction. Interoperability, like sex, is a social activity. If you're doing it alone then you're doing it wrong.
Standards are written documents -- text -- and as such they require interpretation. There are many schools of textual interpretation: legal, literary, historic, linguistic, etc. The most relevant one, from the perspective of a standard, is what is called "purposive" or "commercial" interpretation, commonly applied by judges to contracts. When interpreting a document using an purposive view, you look at the purpose, or intent, of a document in its full context, and interpret the text harmoniously with that intent. Since the purpose of a standard is to foster interoperability, any interpretation of the text of a standard which is used to argue in favor of, or in defense of, a non-interoperable implementation, has missed the mark. Not all interpretations are equal. Interpretations which are incongruous with the intent of standardization can easily be rejected.
Standards can not force a vendor to be interoperable. If a vendor wishes deliberately to withhold interoperability from the market, then they will always be able to do so, and, in most cases, devise an excuse using the text of the standard as a scapegoat.
Let's work through a quick example, to show how this can happen.
OpenFormula is the part of ODF 1.2 that defines spreadsheet formulas. The current draft defines the addition operator as:
6.3.1 Infix Operator "+"
Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds numbers together.
I think most vendors would manage to make an interoperable implementation of this. But if you wanted to be incompatible, there are certainly ways to do so. For example, given the expression "1+1" I could return "42" and still claim to be interoperable. Why? Because the text says "adds numbers together", but doesn't explicitly say which numbers to add together. If you decided to add 1 and 41 together, you could claim to be conformant. OK, so let's correct the text so it now reads:
6.3.1 Infix Operator "+"
Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds Left to Right.
So, this is bullet-proof now, right? Not really. If I want to, I can say that 1+1 =10, if I want to claim that my implementation works in base 2. We can fix that in the standard, giving us:
6.3.1 Infix Operator "+"
Summary: Add two numbers.
Syntax: Number Left + Number Right, both in base 10 representations
Returns: Number, in base 10
Constraints: None
Semantics: Adds Left to Right.
Better, perhaps. But if I want I can still break compatibility. For example, I could say 1+1=0, and claim that my implementation rounds off to the nearest multiple of 5. Or I could say that 1+1 = 1, claiming that the '+' sign was taken as representing the logical disjunction operator rather than arithmetic addition. Or I could do addition modulo 7, and say that the text did not explicitly forbid that. Or I could return the correct answer some times, but not other times, claiming that the standard did not say "always". Or I could just insert a sleep(5000) statement in my code, and pause 5 seconds every time the an addition operation is performed, making a useless, but conformant implementation And so on, and so on.
The old adage holds, "It is impossible to make anything fool- proof because fools are so ingenious." A standard cannot compel interoperability from those who want resist it. A standard is merely one tool, which when combined with others, like test suites and plugfests, facilitates groups of cooperating parties to achieve interoperability.
Now is the time to achieve interoperability among ODF implementations. We're beyond kind words and empty promises. When Microsoft first announced, last May, that it would add ODF support to Office 2007 SP2, they did so with many fine words:
- "Microsoft Corp. is offering customers greater choice and more flexibility among document formats"
- Microsoft is "committed to work with others toward robust, consistent and interoperable implementations"
- Chris Capossela, senior vice president for the Microsoft Business Division: "We are committed to providing Office users with greater choice among document formats and enhanced interoperability between those formats and the applications that implement them"
- "Microsoft recognizes that customers care most about real-world interoperability in the marketplace, so the company is committed to continuing to engage the IT community to achieve that goal when it comes to document format standards"
- Microsoft will "work with the Interoperability Executive Customer Council and other customers to identify the areas where document format interoperability matters most, and then collaborate with other vendors to achieve interoperability between their implementations of the formats that customers are using today. This work will continue to be carried out in the Interop Vendor Alliance, the Document Interoperability Initiative, and a range of other interoperability labs and collaborative venues."
- "This work on document formats is only one aspect of how Microsoft is delivering choice, interoperability and innovative solutions to the marketplace."
The pretty words have been shown to be hollow words. Microsoft has not enabled choice. Their implementation is not robust. They have, in effect, taken your ODF document, written by you by your choice in an interoperable format, with demonstrated interoperability among several implementations, and corrupted it, without your knowledge or consent.
There are no shortage of excuses from Redmond. If customers wanted excuses more than interoperability they would be quite pleased by Microsoft's prolix effusions on this topic. The volume of text used to excuse their interoperability failure, exceeds, by an order of magnitude, the amount of code that would be required to fix the problem. The latest excuse is the paternalistic concern expressed by Doug Mahugh, saying that they are corrupting spreadsheets in order to protect the user. Using a contrived example, of a customer who tries to add cells containing text to those containing numbers, Doug observes that OpenOffice and Excel give different answers to the formula = 1+ "2". Because all implementations do not give the same answer, Microsoft strips out formulas. Better to be the broken clock that reads the correct time twice a day, than to be unpredictable, or as Doug puts it:
This bears a close resemblance to what is sometimes called "Ben Tre Logic", after the Vietnamese town whose demise was excused by a US General with the argument, "It became necessary to destroy the village in order to save it."If I move my spreadsheet from one application to another, and then discover I can’t recalculate it any longer, that is certainly disappointing. But the behavior is predictable: nothing recalculates, and no erroneous results are created.
But what if I move my spreadsheet and everything looks fine at first, and I can recalculate my totals, but only much later do I discover that the results are completely different than the results I got in the first application?
That will most definitely not be a predictable experience. And in actual fact, the unpredictable consequences of that sort of variation in spreadsheet behavior can be very consequential for some users. Our customers expect and require accurate, predictable results, and so do we. That’s why we put so much time, money and effort into working through these difficult issues.
Doug's argument may sound plausible at first glance. There is that scary "unpredictable consequences". We can't have any of that, can we? Civilization would fall, right? But what if I told you that the same error with the same spreadsheet formula occurs when you exchange spreadsheets in OOXML format between Excel and OpenOffice? Ditto for exchanging them in the binary XLS format. In reality, this difference in behavior has nothing to do with the format, ODF or OOXML or XLS. It is a property of the application. So, why is Microsoft not stripping out formulas when reading OOXML spreadsheet files? After all, they have exactly the same bug that Doug uses as the centerpiece of his argument for why formulas are stripped from ODF documents. Why is Microsoft not concerned with "unpredictable consequences" when using OOXML? Why do users seem not to require "accurate, predictable results" when using OOXML? Or to be blunt, why is Microsoft discriminating against their own paying customers who have chosen to use ODF rather than OOXML? How is this reconciled with Microsoft's claim that they are delivering "choice, interoperability and innovative solutions to the marketplace"?
Labels: Interoperability, ODF
Thursday, May 07, 2009
A follow-up on Excel 2007 SP2's ODF support
But this widespread interest in the topic tells me one thing: ODF is important. People care about it. People want it to succeed, and when this success is threatened, whether for deliberate or accidental reasons, they are upset. Although Office 2007 SP2 also added PDF and XPS support, you don't see many stories on that at all.
I've been trying to respond to the many comments by anonymous FUDsters and Fanboys on various web sites where my post is being discussed. However, it is getting rather laborious swatting all the gnats. They obviously breed in stagnant waters, and there is an awful lot of that on the web. Since all links lead back here anyways, it will be much simpler to do a recap here and address some of the more widespread errors.
The talking points from Redmond seem to be consistent, along the lines of:
We did a 100% perfect and conforming implementation of ODF 1.1 to the letter of the standard. If it is not interoperable, then it is the fault of the standard or the other applications or some guy we saw sneaking around back on the night of the fire. In any case, it is not our fault. We just design, write, test and sell software to users, businesses, governments and educational institutions. We have no influence over whether our products are interoperable or not. What effect SP2 has on users or the market -- that's not our concern. Come back in 50 years when you have a 100% perfect standard and maybe we'll talk.
In other words, all of those Interoperability Directors and Interoperability Architects at Microsoft seem to have (hopefully temporarily) switched into Minimal Conformance Directors and Minimal Conformance Architects, and are gazing at their navels. I hope they did not suffer a reduction in salary commensurate with the reduction in their claimed responsibilities.
In any case, their argument might be challenged on several grounds. First up is the question of whether the ODF documents written by Excel 2007 SP2 indeed conform to the ODF 1.1 standard. This is not a hard question to answer, but please excuse this short technical diversion.
Let's see what the ODF 1.1 standard says in section 8.1.3 (Table Cell):
Addresses of cells that contain numbers. The addresses can be relative or absolute, see section 8.3.1. Addresses in formulas start with a “[“ and end with a “]”. See sections 8.3.1 and 8.3.1 for information about how to address a cell or cell range.
And the referenced section 8.3.1 further says:
To reference table cells so called cell addresses are used. The structure of a cell address is as follows:
- The name of the table.
- A dot (.)
- An alphabetic value representing the column. The letter A represents column 1, B represents column 2, and so on. AA represents column 27, AB represents column 28, and so on.
- A numeric value representing the row. The number 1 represents the first row, the number 2 represents the second row, and so on.
This means that A1 represents the cell in column 1 and row 1. B1 represents the cell in column 2 and row 1. A2 represents the cell in column 1 and row 2.
For example, in a table with the name SampleTable the cell in column 34 and row 16 is referenced by the cell address SampleTable.AH16. In some cases it is not necessary to provide the name of the table. However, the dot must be present. When the table name is not required, the address in the previous example is .AH16
So, going back to my test spreadsheets from all of the various ODF applications, how do these applications encode formulas with cell addresses:
- Symphony 1.3: =[.E12]+[.C13]-[.D13]
- Microsoft/CleverAge 3.0: =[.E12]+[.C13]-[.D13]
- KSpread 1.6.3: =[.E12]+[.C13]-[.D13]
- Google Spreadsheets: =[.E12]+[.C13]-[.D13]
- OpenOffice 3.01: =[.E12]+[.C13]-[.D13]
- Sun Plugin 3.0: [.E12]+[.C13]-[.D13]
- Excel 2007 SP2: =E12+C13-D13
Next is the question of the relationship between interoperability and conformance. So we are not building skyscrapers in the air, let's start with a working definition of interoperability, say that given by ISO/IEC 2382-01, "Information Technology Vocabulary, Fundamental Terms":
The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units
I think we probably have a better sense of what conformance is. Something conforms when it meets the requirements defined by a standard.
So let's explore explore the relationship between conformance to a standard and interoperability.
First, does interoperability require a standard? No. There have been interoperable systems without formal standards. For example, there is a degree of interoperability among spreadsheet vendors on the basis of the legacy Excel binary file format (XLS), even though the binary format was never standardized and never defines spreadsheet formulas. Another example is the SAX XML parsing API. Widely implemented, but never standardized. We may call them informal or de facto standards.
Additionally, many standards start out as informal technical agreements and specifications that achieve interoperability among a small group of users, who then move it forward to standardization so that a broader audience can benefit. But the interoperability came first and the formal standard came second. See the history of the Atom syndication format for a good example.
Second, Is interoperability possible in the presence of non-conformance? Yes. For example, it is well known that the vast majority of web pages (93% by one estimate) on the web today do not conform to the HTML standard. But there is a not unsubstantial degree of interoperability on the web today in spite of this lack of conformance. Generally, interoperability does not require perfection. It requires good faith and hard work. If perfection were required, nothing would work in this world, would it?
Third, if a standard does not define something (like spreadsheet formulas) then I am allowed to do whatever I want, right? This is true. But further, even if ODF 1.1 did define spreadsheet formulas you would still be allowed to do whatever you want. Remember, these are voluntary standards. We can't force you to do anything, whether we define it or not.
So what then is the precise relationship between conformance and interoperability? I'd state it as:
- In general, conformance is neither necessary nor sufficient for to achieve interoperability.
- But interoperability is most efficiently achieved by conformance to an open standard where the standard clearly states those requirements which must be met to achieve interoperability.
The inefficiency of other orientations is seen with HTML and Web browsers. Because of the historically low level of HTML conformance by authoring tools and users who hand-edit HTML, browsers today are much more complex then they would otherwise need to be. They need to handle all sorts of mal-formed HTML documents. This complexity extends to any tool that needs to process HTML. Sure, we have a pretty good grip on this now, with tools like HTML Tidy and other robust parsers, but this has come at a cost. Complexity eats up resources, both to coders and testers, but also runtime resources, memory and processing cycles. More complex code is harder to maintain and secure and tends to have more bugs. Greater conformance would have lead to a more efficient relationship between conformance and interoperability.
Similarly, the many years of non-conformance in browsers, most notably Internet Explorer, to the CSS2 standard has resulted in an inefficiency there. From the perspective of web designers, tool authors and competing browser vendors, the lack of conformance to the standards has increased the cost needed to achieve interoperability, a cost transferred from a dominate vendor who chose not to conform to the standards, to other vendors who did conform.
The efficiency of conformance to open standards in particular is the clarity and freedom it provides around access to the standard and the contingent IP rights needed to implement the standard.
So back to ODF 1.1. What is the relationship between conformance and interoperability there? Clearly, it is not yet at that optimal point (which few standards ever achieve) where interoperability is most-efficiently achieved. We're working on it. ODF 1.2 will be better in that regard than ODF 1.1, and the next version will improve on that, and so on.
Does this mean that you cannot create interoperable solutions with ODF? No, it just means that, like most standards in IT today, you need to do some interoperability testing with other vendor's products to make sure your product interoperates, and make conformant adjustments to your product in order to achieve real-world nteroperability. Most vendors who don't have a monopoly would do this naturally and in fact have done this, as my chart indicated. Complaining about this is like complaining about gravity or friction or entropy. Sure, it sucks. Deal with it. Although it may not pay as much as being a professional mourner, work as a programmer is more regular. And giving value to customers will always bring more satisfaction than than standing there weeping about how code is hard.
In any case, this comes down to why do you implement a standard. What are your goals? If your goal is be interoperable, then you perform interoperability testing and make those adjustments to your product necessary to make it be both conformant and interoperable. But if your goal is to simply fulfill a checkbox requirement without actually providing any tangible customer benefit, then you will do as little as needed. However, if your goal is to destroy a standard, then you will create a non-conformant, non-interoperable implementation, automatically download it to millions of users and sow confusion in the marketplace by flooding it with millions of incompatible documents. It all depends on your goals. Voluntary standards do not force, or prevent, one approach or another.
To wrap this up, I stand on the table of interoperability results in the previous post. SP2 has reduced the level of interoperability among ODF spreadsheets, by failing to produce conforming ODF documents, and failing to take note of the spreadsheet formula conventions that had been adopted by all of the other vendors and which are working their way through OASIS as a standard.
If we note the arguments used by Microsoft in the recent past, they have argued that OOXML must be exactly what it is -- flaws and all -- in order to be compatible with legacy binary Office documents. Then they argued that OOXML can not be changed in ISO, because that would create incompatibility with the "new legacy" documents in Office 2007 XML format. But when it comes to ODF, they have disregarded all legacy ODF documents created by all other ODF vendors and take an aloof stance that looks with disdain on interoperability with other vendor's documents, or even documents produced by their own ODF Add-in. The sacrosanctness of legacy compatibility appears to be reserved, for strategic reasons, for some formats but not others. We'll redefine the Gregorian calender in ISO to be interoperable with one format if we need to, but we won't deign, won't stoop, won't dirty ourselves to use the code we already have from the ODF Add-in for Microsoft Office, to make SP2 formulas interoperable with the other vendors' products, to benefit our own users who are asking for ODF support in Office. As I said before, this ain't right.
Labels: ODF
Tuesday, May 05, 2009
OpenDocument Format: The Standard for Office Documents
At the same time I've taken the opportunity to put together a new web page of some of my other publications, workshop and conference presentations. I have few others that I want add, once I find them. But this is a start.
Labels: ODF
Sunday, May 03, 2009
Update on ODF Spreadsheet Interoperability
A couple of months ago I did some experiments on the interoperability of ODF spreadsheets, the theory and practice. In that earlier post I looked at the then current ODF implementations, including:
- OpenOffice.org 2.4
- Google Spreadsheets
- KOffice KSpread 1.6.3
- IBM Lotus Symphony 1.1
- Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
- Microsoft Office 2003 with Sun's ODF Plugin
| Created In | |||||||
| CleverAge | Google | KSpread | Symphony | OpenOffice | Sun Plugin | ||
|---|---|---|---|---|---|---|---|
Read In | CleverAge | OK | OK | Fail | Fail | OK | OK |
| OK | OK | OK | OK | OK | OK | ||
| KSpread | OK | OK | OK | OK | OK | OK | |
| Symphony | OK | OK | OK | OK | OK | OK | |
| OpenOffice | OK | OK | OK | OK | OK | OK | |
| Sun Plugin | OK | OK | OK | OK | OK | OK |
I lot has happened in the two months since I did that analysis. Several of the applications I tested have been updated:
- CleverAge has released version 3.0 of their Add-in.
- OpenOffice 3.01 is now out and in wide use.
- Symphony 1.3 is now in beta.
- The Sun ODF Plugin is now at version 3.0.
- Microsoft Office 2007 SP2 has been released, with integrated ODF support.
- KOffice 2.0 RC 1 is now available.
I'll explain a bit more how I tested, then give you the table of results, and finally make some observations and recommendations.
The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:
- Enter the title in A1 "May's Wedding Planner" and increased font size to 14 point.
- Enter formula = TODAY() in B3 and set US style MM/DD/YY date format/
- Enter the date of the wedding as a constant in cell B4, also setting date format.
- Added simple calculations on cells B6-B8, to calculate days, weeks and months until the wedding.
- A11 through E16 is a simple ledger of the kind that is done thousands of times a day by spreadsheet users everywhere. Once you have the formula set up in column E (Balance = previous balance + credits - debits) then you can simply copy down the formula to the new row for each new entry.

Feel free to download a zip of all of the test spreadsheet files. The file names should be self-explanatory.
Here is what I found when I tested the various scenarios:
| Created In | ||||||||
| Google | KSpread | Symphony | OpenOffice | Sun Plugin | CleverAge | MS Office 2007 SP2 | ||
|---|---|---|---|---|---|---|---|---|
Read In | OK | OK | OK | OK | Fail | OK | Fail | |
| KSpread | OK | OK | OK | Fail | Fail | OK | Fail | |
| Symphony | OK | OK | OK | OK | OK | Fail | Fail | |
| OpenOffice | OK | OK | OK | OK | OK | OK | Fail | |
| Sun Plugin | OK | OK | OK | OK | OK | OK | Fail | |
| CleverAge Plugin | OK | OK | OK | OK | Fail | OK | OK | |
| MS Office 2007 SP2 | Fail | Fail | Fail | Fail | Fail | Fail | OK |
So what is happening here?
CleverAge appears to have heeded the advice from my earlier blog post and now correctly processes KSpread and Symphony spreadsheets. This is great news and they deserve credit for that work. But this is a small bit of good news in a table that now shows awful lot of red. Let's see if we can figure this out.
First, some combinations that worked previously, when I tested two months ago, are now not working:
- Symphony 1.3 beta hangs when attempting to load the spreadsheet created with the CleverAge 3.0 ODF Add-in. Symphony 1.1 also hangs when trying to load that same spreadsheet. However both versions of Symphony work fine when loading the CleverAge 2.5 spreadsheet from two months ago. The CleverAge document appears to be valid, so my guess is that this is a bug in the Symphony 1.3 beta. I'll pass this document on to the Symphony development team to see what they say.
- KSpread 1.6.3 does not read formulas from OpenOffice 3.01 documents. KSpread had no problems with OO 2.4 documents. The problem appears to be that OpenOffice 3.01, by default, writes out documents according to the ODF 1.2 draft which puts formulas in the OpenFormula namespace. But KSpread is expecting them in the legacy namespace. The result is that spreadsheet formulas are dropped when the document is loaded in KSpread.
- In a similar way, Sun's new ODF Plugin writes out documents according to the ODF 1.2 draft. KOffice is unable to handle these files. This also causes problems for Google Spreadsheets and the Microsoft/CleverAge Plugin for Excel, which report errors "We were unable to upload this document" and "The converter failed to open this file".
This can cause subtle and not so subtle errors and data loss. For example, in the test document I presented above, the current date is encoded using the TODAY() spreadsheet function. If the formulas are stripped, then this cell no longer updates, and will return the wrong value. Similarly, if Maya tries to continue her ledger of expenses by copying the formula cells from column E down a row, this will cause incorrect calculations, since there is no longer a formula to copy, so she would just be copying the prior balance. In general, SP2 converts an ODF spreadsheet into a mere "table of numbers" and any calculation logic is lost.
In the other direction, when writing out spreadsheets in ODF format, Excel 2007 SP2 does include spreadsheet formulas but places them into an Excel namespace. This namespace is not what OpenOffice and other ODF applications use. It is not the ODF 1.2 namespace. It isn't even the OOXML namespace. I have no idea what it is or what it means. Not every ODF application checks the namespace of formulas when loading documents, but the ones that do reject the SP2 documents altogether. And the ones that do not check the namespace try and fail to load a formula since it is syntactically different than what they expected. The applications essentially display a corrupted document that is shows neither the formula nor the value correctly. For example, a SP2 document, loaded in MS Office using the Sun ODF Plugin looks like this:

Similar corruption occurs when loading the Excel 2007 SP2 spreadsheet into KSpread, Symphony and OpenOffice. Google doesn't import the document at all.
I must admit that I'm disappointed by these results. This is not a step forward compared to where we were two months ago. This is a big step backwards. Spreadsheet interoperability is not hard. This is not rocket science. Everyone knows what TODAY() means. Everyone knows what =A1+A2 means. To get this wrong requires more effort than getting it right. It is especially frustrating when we know that the underlying applications support the same fundamental formula language, or something very close to it, and are tripped up by lack of namespace coordination. Whether it is accidental or intentional I don't know or care. But I cannot fail to notice that the same application -- Microsoft Excel 2007 -- will process ODF spreadsheet documents without problems when loaded via the Sun or CleverAge plugins, but will miserably fail when using the "improved" integrated code in Office 2007 SP2. This ain't right.
I have some suggestions for how to move things forward again. There will be a lot less red on the above table if two simple changes are made:
- Sun should write out formulas in ODF 1.1 format, using the legacy "oooc" namespace prefix that the other vendors are using. Remember, the other vendors are using that namespace specifically for compatibility with OO's ODF documents. This is the current convention. To unilaterally switch, without notice or coordination, to a new namespace, is not cool. When ODF 1.2 is an approved standard, then we all can move there in a coordinated fashion, to cause users minimal inconvenience. But the above table clearly shows the confusion that results if this move is not coordinated. I know OO 3.01 has an option to save in ODF 1.0/1.1 format. IMHO, this setting should be the default. I'm not sure if the Sun Plugin has a similar configuration option, but I hope it does.
- In addition to writing out compatible formulas as per the above comments on the Sub Plugin, Microsoft should remove the code in SP2 that causes it to reject every other vendor's spreadsheet documents. Give the user a warning if you need to, but let them have the choice.
First, we might hear that ODF 1.1 does not define spreadsheet formulas and therefore it is not necessary for one vendor to use the same formula language that other vendors use. This is certainly is true if your sole goal is to claim conformance. If your business model requires only conformance and not actually achieving interoperability, then I wish you well. But remember that conformance and interoperability are not mutually exclusive options. An application can be conformant to a standard and also be interoperable, if you use the legacy formula namespace and syntax. So the desire to be conformant is not an excuse for not also being interoperable, or at least not a valid excuse. One might also wryly note that Microsoft has several Directors of Interoperability, not Directors of Minimal Conformance, and they workshops are called Document Interoperability Initiatives, not Minimal Conformance Initiatives. The difference between minimal conformance and interoperability is well illustrated in these tests.
Remember, it is not particularly difficult or clever to to take an adverse reading of a standard to make an incompatible, non-interoperable product. Take HTML, for example. It does not define the attributes of unstyled (default) text. So I could create a perfectly conformant browser implementation that makes all default text be 4-point Zapf Dingbats, white text on a white background. It would conform with the standard, but it would be perfectly unusable by anyone. If you try hard enough you can create 100% conformant, but non-interoperable, implementations of almost most standards. Standards are voluntary, written to help coordinate multiple parties in their desires for interoperability. Standards are not written to compel interoperability by parties who do not wish to be interoperable.
(A side point is that SP2's implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard, but that is another story, for another blog post.)
We might also hear concerns that supporting other vendors' ODF spreadsheet formulas cannot be done because this formula language is undocumented. The irony here is that the formula language used by OpenOffice (and by other vendors) is based on that used by Excel, which itself was not fully documented when OpenOffice implemented it. So an argument, by Microsoft, not to support that language because it is not documented is rather hypocritical. Excel supports 1-2-3 files and formulas and legacy Excel versions (back to Excel 4.0) neither of which have standardized formula languages. Why are these supported? Also, the fact that the Microsoft/CleverAge add-in correctly reads and writes the legacy ODF formula syntax shows not only that it can be done, but that Microsoft already has the code to do it. The inexplicable thing is why that code never made it into Excel 2007 SP2.
We'll probably also hear that 100% compatibility with legacy documents is critical to Microsoft users and that it is dangerous to try to save Excel formulas into interoperable ODF formulas because there is no guarantees that OpenOffice or any other ODF application will interpret them the same as Excel does. So one might try to claim that Microsoft is protecting their customers by preventing them from saving interoperable spreadsheet formulas. But we should note that fully-licensed Microsoft Office users have already been creating legacy documents in ODF format, using the Microsoft/CleverAge ODF Add-in. These paying Microsoft Office customers will now see their existing investment in ODF documents, created using Microsoft-sanctioned code, get corrupted when loaded in Excel 2007 SP2. Why are paying Microsoft customers who used ODF less important than Microsoft customers who used OOXML? That is the shocking thing here, the way in which users of the ODF Add-in are being sacrificed.
If you are cynical, you might observe that if Excel 2007 SP2 allowed Microsoft/CleverAge ODF Add-in formulas to work correctly, then SP2 would need to allow all vendors' formulas to work, since the other vendors are using the same legacy namespace. The only way for Microsoft to make their legacy ODF documents work and to exclude other vendors would be (hypothetically) to specifically look in the document for the name of the application that created the document, and allow their ODF Add-in but reject OpenOffice, etc. IANAL, but I think something like that would look very, very bad to competition authorities. So the only way out, if your goal (hypothetically) is to avoid interoperability, is to sacrifice your existing Office customers who are using the Microsoft/CleverAge ODF Add-in. It serves them right for not sticking to the party line in the first place. This'll teach 'em good.
Of course, I am not that cynical. I was taught to never assume malice where incompetence would be the simpler explanation. But the degree of incompetence needed to explain SP2's poor ODF support boggles the mind and leads me to further uncharitable thoughts. So I must stop here.
As I mentioned before, this is a step backwards. But it is just one step on the journey. Let's look forward (and move forward). This is just code. Code can be fixed. We know exactly what is needed to have good interoperability of spreadsheet formulas. In fact most of the code already exists for this. The only thing we need now is to actually go do it and not get too far ahead, or lag too far behind from the other implementations. This is more a question of timing and coordination than hard technical problems.
[2009/05/07 -- For more on this topic, see my "A follow-up on Excel 2007 SP2's ODF Support"]
Labels: ODF
Tuesday, March 24, 2009
Taking Control of Your Documents
The Objective
When you save a document in your word processor, your work is encoded in a particular file format. You often have a choice of formats that you can use, with names like DOC, DOCX, RTF, WPD or ODT. Your choice of format will influence whether others can easily read your document today, whether you yourself will be able to read your document ten years from now, and whether you will be able to migrate painlessly to another word processor or operating system if and when you choose to do so.
Although many users simply click “Save” and give no thought to which format is being used under the covers, this unthinking use of the word processor's default settings is a recipe for vendor lock-in. In fact, several vendors intentionally set their default format to be ones which will only work well with their own software, fostering dependency on that vendor's software and lessening the user's ability to take advantage of other options in the market. The more documents you save and accumulate in a vendor's proprietary format, the harder it will be for you to consider any other choices.
The objective of this paper is to show you, the user, how to extricate yourself from this cycle of dependency and take control of your documents. Specifically, we show how you can, in three easy steps, free yourself from a Microsoft Office dependency. In the end you may, of course, choose to remain on Microsoft Office. You may decide to migrate to an alternative word processor. That, in the end, is your choice. But by following the three steps outlined below, your freedom of action will be preserved, and your choice of word processor will be based on your priorities and your needs, and not forced on you by your current application vendor.
Step 1: Take control of the default format
The older versions of Microsoft Office, Office 97-Office 2003), by default save documents in a family of binary formats with the extensions DOC (Word), XLS (Excel) and PPT (PowerPoint). Although these formats are proprietary Microsoft formats, over the past decade 3rd party applications have developed the capability to read and write these formats.
However, starting in Office 2007 Microsoft suddenly switched the default format to something called Office Open XML (OOXML). This format is not widely supported outside of Office 2007. So if you save a document in the OOXML format you make it harder for anyone else to read your document unless they are also using Microsoft Office 2007. In almost all cases, the same document, if saved in the legacy DOC format will be more interoperable. Staying with the default choice, OOXML, only restricts your choices and make you more dependent on Microsoft Office. Of course, that is why Microsoft made OOXML the default format.
The first step to liberate yourself from Microsoft Office dependency is to change the default format in Microsoft Office 2007 away from OOXML and back to the early binary formats supported by Office 97-2003, which are widely supported by 3rd party applications. This is a neutral step that preserves the status quo. By making these changes you will still be able to read and edit any OOXML documents that are sent to you, but all new documents you create will be saved in the more widely supported DOC/XLS/PPT formats.
If you are using Microsoft Office 2003 or earlier, then you should skip this Step and move on to Step 2, since OOXML is not the default format in those earlier Office versions.
To change the defaults, you will need to load Word 2007, Excel 2007 and PowerPoint 2007 and follow the following steps.
Word 2007
- Click the Office Button (the unlabeled logo button in the upper left of the program).
- Click “Word Options” at the bottom of the dialog.
- Go to the “Save” section.
- For the “Save files in this format” setting, choose “Word 97-2003 Document(*.doc)”.
- Click OK.

Excel 2007
- Click the Office Button (the unlabeled logo button in the upper left of the program).
- Click “Excel Options” at the bottom of the dialog.
- Go to the “Save” section.
- For the “Save files in this format” setting choose “Excel 97-2003 Workbook (*.xls)”.
- Click OK.

PowerPoint 2007
- Click the Office Button (the unlabeled logo button in the upper left of the program).
- Click “PowerPoint Options” at the bottom of the dialog.
- Go to the “Save” section
- For the “Save files in this format” setting, choose “PowerPoint Presentation 97-2003”.
- Click OK.

Administrators should also note that these settings may be made directly in the Windows Registry, and automatically pushed out to a work group via a login script or group policy. The registry settings corresponding to the above changes are:
HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Word\Options
Add String DefaultFormat=Doc
HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Excel\Options
Add DWORD DefaultFormat=38 (Hexadecimal)
HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\PowerPoint\Options
Add DWORD DefaultFormat=0 (Hexadecimal)
Step 2: Enable OpenDocument Format Support
Now that you've made the first steps towards taking control of your documents by preventing the lock-in effects of the OOXML default, it is time to take further control. You'll now want to enable OpenDocument Format (ODF=ISO/IEC 26300) support in Microsoft Office, so you can save and exchange documents using the free and open International Standard while remaining in the familiar Microsoft Office interface.
ODF is an XML-based, open document format standard, designed to be platform- and application-neutral and support interoperable use across applications, eliminating vendor lock-in. ODF is supported by many applications, including office suites from Sun, IBM, Novell and Google, as well as open source projects like OpenOffice, KOffice and AbiWord. Additional applications supporting ODF are listed on Wikipedia.
Microsoft Office does not currently support ODF “out of the box”, but you can enable ODF support in Office by installing a “plugin”, sometimes called an “add-in”. A plugin will add additional options or menu items to the Microsoft Office UI, allowing you to open and save documents in ODF format. In some cases you can even set ODF as the default format for new documents.
There are three main choices for adding ODF support to Microsoft Office:
- Sun Microsystems has published an “ODF Plugin for Microsoft Office” which supports Office 2000, XP, 2003 and 2007 SP1.
- Microsoft has sponsored an open source project on SourceForge for an “ODF Add-in for Microsoft Office”, which supports Office 2007, and also Office 2003 and Office XP if the Microsoft Office Compatibility Pack is also installed
- Microsoft has announced that Office 2007 Service Pack 2 (SP2) will enable ODF support in Office 2007, but this code is not yet available.
Step 2 is to evaluate and adopt a plugin to add ODF support to Microsoft Office. Start using ODF now, saving your documents in the open standard document format. This allows you to remain in Office, for now, while building your familiarity and comfort level with ODF.
Step 3: Exercise your Right to Choose a Native ODF Editor
The plug-in approach is a transitional approach. It allows you to continue working in Microsoft Office while you enable ODF support side-by-side. But at some point you will want to consider your options. Maybe you find that converting back and forth to ODF format in MS Office is slow. Maybe you are using Office 2003 currently, but want to avoid paying for an Office 2007 upgrade when mainstream support for Office 2003 comes to an end on April 14th, 2009. At some point you will want to move to an application that supports ODF natively. You are free at this point and have a wide variety of choices.
- You can stay on Windows or consider moving to Linux or the Mac.
- You can stay with a traditional client editor, or move to a web based editor.
- You can use commercial software, or use open source software.
The important thing is that you have taken control of your documents. You are no longer dependent on Microsoft Office and its file format. You have broken free of the vendor lock-in. You are free to choose an alternative word processor when you want to and if you want to. Until then, be comfortable in knowing that you are keeping your options open while remaining in control of your documents.

Taking Control of Your Documents by Rob Weir is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
This paper is also available in ODF and PDF formats.
Labels: ODF
Monday, March 23, 2009
Introducing Planet ODF
Planet ODF aggregates several blogs, news sources, discussion forums and other online services related to ODF. I've tried to be semi-intelligent so you don't get random stories about the Oregon Department of Foresty or non-ODF blog posts by me. I'll tune the feeds over time, but the hope is to make it 100% ODF relevant content.
If you have a blog, discussion forum or any other ODF-related content with an Atom or RSS feed and want it included, then please let me know. It doesn't need to be 100% ODF. You can discuss your cats 90% of the time and ODF 10% of the time and I can set up a filter to bring in the relevant content.
Also, I've set up an OpenDocument Format group on the social bookmarking site Diigo. (I abandoned del.icio.us when the Microsoft/Yahoo takeover rumors started.) Even if you don't have a blog or a web site with a feed, you can use Diigo to bookmark any articles you think are relavent to ODF. If you send those links to the OpenDocument Format group, then they will automatically be included in the Planet ODF feed.
Enjoy, and pass on the good news.
Labels: ODF
Wednesday, March 04, 2009
From the Statute of Frauds to WYSIWYS: Document Format Implications
"An Act for prevention of Frauds and Perjuryes" 29 Carol. II (1677), commonly called "The Statute of Frauds", begins:
For prevention of many fraudulent Practices which are commonly endeavoured to be upheld by Perjury and Subornation of Perjury Bee it enacted by the Kings most excellent Majestie by and with the advice and consent of the Lords Spirituall and Temporall and the Commons in this present Parlyament assembled and by the authoritie of the same That from and after the fower and twentyeth day of June which shall be in the yeare of our Lord one thousand six hundred seaventy and seaven All Leases Estates Interests of Freehold or Termes of yeares or any uncertaine Interest of in to or out of any Messuages Mannours Lands Tenements or Hereditaments made or created by Livery and Seisin onely or by Parole and not putt in Writeing and signed by the parties soe makeing or creating the same or their Agents thereunto lawfully authorized by Writeing, shall have the force and effect of Leases or Estates at Will onely and shall not either in Law or Equity be deemed or taken to have any other or greater force or effect, Any consideration for makeing any such Parole Leases or Estates or any former Law or Usage to the contrary notwithstanding.
Or, to loosely paraphrase in modern English: "We've noticed that verbal agreements are being abused. So in certain specific important agreements you better put it in writing and sign it, otherwise don't bother to bring any dispute to court."
A few things to note about the Statute and its context:
- As the preface notes, frauds were being perpetrated, involving oral contracts and perjury. Before this Statute, oral testimony, even without any evidence of a written agreement, could be used to deprive a person of real or personal property.
- The Statute is concerned with private agreements. Although it was already well-established practice by this time for official acts, writs, etc., to be recorded in written form and sealed, literacy, even among tradesmen, was not high, and private agreements were made only orally.
- The imposition of a stamp duty or tax to seal official documents, followed this Statute a few years later, ostensibly to raise funds to fight a war against France. But like all forms of taxation, they seem to outlive their original intent, and exist even to the present day, even though England apparently is now at peace with France.
A contract for the sale of goods for the price of $5,000 or more is not enforceable by way of action or defense unless there is some record sufficient to indicate that a contract for sale has been made between the parties and signed by the party against which enforcement is sought or by the party's authorized agent or broker.
I'd like to look a little at what it is about a written agreement that gives it its particular value. Why did they require it to be written? Why not just require witnesses to an oral agreement?
A few salient properties of a written agreement:
- A written agreement states the parties to the agreement, the terms of the agreement and is signed by the parties.
- Once signed, the agreement may not be altered but by mutual consent of the parties. In the judgement of Brett v. Ridgen, Plowd. Comm., 345, Lord Dyer wrote that "...men's deeds and wills, by which they settle their estates, are the laws which private men are allowed to make, and they are not to be altered, even by the King in his court of law or conscience. We must take it as we find it."
- The "mirror image" rule applies. Both parties must agree to the same terms. If part A makes an offer, and party B says they accept, but in fact adds or qualifies the terms of the offer, then this is properly treated as a counter-offer. The agreement is not made until both parties agree to the same terms.
- The underlying mechanics and notation of the agreement are flexible, unless otherwise specified. Whether scribbled with a crayon on a napkin, sent by telegram, teletype, fax or email, these may all be considered written agreements.
- Paper/ink expresses symmetric information. What you see is what I see and is what will be seen in court if we end up there some day.
- There is no invisible ink, no hidden pages. The text of the agreement does not say something different under the florescent lights at the court house versus the sunlight at the construction site. Although these things in theory could be done, via special inks and papers, the use of these techniques in an agreement would be prima facie evidence of fraud.
- Certainly, if it is poorly written, the terms of the agreement could be ambiguous and subject to various interpretations. Paper/ink cannot make you or your lawyer smarter. It only makes the agreement an accurate and reliable record. If a particular word is smudged or a number is crudely written, I can see this flaw and you can see this flaw and either of us can require the flaw to be fixed before we sign the agreement. If there is text that is unclear in meaning, I can ask my lawyer to explain it. I am able to understand the document perfectly should I take care to do so.
- Paper/ink is accurate and reliable over the time scale of personal and commercial contracts.
- A person's signature or mark on an agreement, absent evidence of fraud or coercion, clearly indicates their assent to the terms of the agreement. We do not commonly write our signature unless we intend to express assent.
Jump ahead to the present day, with the increasing use of electronic documents and digital signatures. Digital signatures offer some of the same affordances we traditionally had with paper/ink. Provided the chain of certificates and keys have not been compromised, that the underlying applications have not been compromised and that the act of signing requires an affirmative and unambiguous action by the signer, a digital signature is evidence of:
- What was signed
- Who signed it
- the intention to sign, i.e., give validity to the agreement
The digital signature guys call out an additional requirement needed for a digital signatures to give the same guarantees as paper/ink agreements. It goes by the acronym WYSIWYS, or "What You See is What You Sign".
So what is required for electronic documents to have the same affordances as paper/ink for use as accurate and reliable records? I suggest the following:
- The format used by the electronic document must be specified in an open standard.
- The format standard must define the characteristics of semantically equivalent documents and specify the format sufficiently so that implementations of the standard can display semantically equivalent renderings of the document. Semantic equivalence is not broken by minute differences in layout, so it should be possible to have semantically equivalent renderings on different devices, e.g., a laptop versus a smart phone versus a screen reader.
- The application used to view and sign the electronic document must conform to standard, specifically those stated parts of the standard necessary to render a semantically equivalent document.
- The document must be strictly conformant to the standard, with no extensions. Just as you would not physically sign a paper document that contained interpolated text in a language that you do not understand, you should not sign an electronic document that contains unknown extensions. Otherwise semantic equivalence is not guaranteed between the two parties and a "mirror image" problem.
- Semantic equivalence must not rely on graphics. Although graphical content is permissible, such content must be redundant with respect to the text. Otherwise the "mirror image" problem is unresolvable between sighted and blind persons.
For editable formats like ODF, I think it points out the need to describe a formal content model that describes the semantic content of a document, aside from its formatting and layout. So text + lists + tables + headers + footers + footnotes + images + captions, etc. Visual appearance is nice to have as well, but it is less robust when rendered on different devices, different operating systems, and is less likely to be robust when rendered on OpenOffice 10.0 in 2015. But the equivalence of the semantic content of an unextended ODF document should provide the same ability to have an accurate and reliable record in an electronic document as we have had traditionally with paper and ink.
Labels: ODF
Tuesday, March 03, 2009
Low-Fat ODF
Jack Sprat could eat no fat.
His wife could eat no lean.
And so betwixt them both, you see,
They licked the platter clean!
Is dietary fat good? Or is it bad? Without getting into a discussion of saturated versus unsaturated fats, or the virtues of omega-3 oils, let me make a few basic, reasonable observations:
- Individuals differ in their preferences and requirements for fat intake. There is no single answer for all people at all times.
- Experts differ in their recommendations for fat intake.
- Standards exist for how to measure and report the fat contents of food products.
- Standards also exist for the specific conditions under which a vendor may call their food products "low fat" or "light" or "fat -free". For example, "low fat" products must have 3g or less fat per serving.
- The government requires vendors of retail packaged food to label the fat content in accordance with standards #3 and make only claims regarding fat content that conform with standards #4.
But take away the standards, take a way the reporting requirements, and the manufacturer has all of the control. Let's imagine a world where there were no such fat content standards. Medical research would still progress and the long-term dangers of high-fat diets would still be known. But the consumer's ability to control their fat content would vastly reduced. There would be no informed choice.
Imagine further that Company A, observing the medical research and consumer interest in healthy food, decides to offer a low-fat cheese. But if Company A sells their low-fat cheese, the label "low fat" itself would have no formal meaning. In this hypothetical, there are no standards. Nothing prevents Company B and Company C from also advertising their existing cheeses as "low fat". Without standards there is no differentiation. Since consumers have no effective way to test the fat content of cheese on their own, they are at the mercy of the non-verifiable claims of vendors and the advertising agencies. Because there are no acknowledged standards for fat content, the market for low-fat cheese is stunted. The consumer does not benefit and the innovative Company A does not benefit. No one wins.
This is a general concern for markets where the consumer cannot directly verify the quality of the goods, because they are packaged and inaccessible to inspection, or because the consumer lacks the technical ability to determine the quality themselves. From fat content to auto gas mileage efficiency, this leads to standards for measuring and reporting qualities of interest to consumers.
So back to reality. We do have fat content standards, for measurement and reporting. Suppose that Company A sells its low-fat cheese and it is very popular, because it is what the consumer wants. Company B is envious of the higher margins on low-fat products, but it would take too long for them to revamp their production line to make a cheese with 3g or less fat per serving. They can only get it down to 5g per serving. What can they do? Well, they can hire a lobbyist, go to Washington, DC, and spread some influence around. They could try to get the FDA to change their definition of "low-fat" so it includes their higher-fat products as well. If you can't change your product to meet the standards that consumers want, then dumb down the standards!
Sound far-fetched? This is actually happening all the time with certified organic food in the United States. Non-organic ingredients are routinely being allowed in organic food products based on requests from big food manufacturers. The consumer has very little visibility or voice in this process.
So what does this all have to do with ODF? Fair question. The analogy is to extensions of ODF, a topic currently being hotly debated on the OASIS ODF Technical Committee. Extensions are additions to an ODF document which are not defined by the ODF standard. They may be proprietary vendor extensions, or extensions using other open standards. But regardless, since their use in an ODF document is not defined by the ODF standard, they are difficult or impossible to use in an interoperable fashion, at least by those who do not know the secret details of the extension. However, such extended documents may be immensely useful in some situations.
So are extensions good? Are they bad? Are you more concerned with interoperability? Or with a particular use that requires the extension? There is no single answer for all people at all times. Because of this, it is important to put control firmly in the hand of the consumer of ODF products, so they can make the appropriate choice for themselves.
Similar to the mechanism of food labeling, putting control in the consumer's hands requires that we:
- Have a formal definition of what an extended ODF document is versus an unextended ODF document.
- Have something like a reporting requirement, so it is clear to the consumer whether a particular document is extended or not.
This is a small step and I know it doesn't sound like much, but even this modest step provoked such a paroxysms on the ODF TC that you would have thought I was splashing holy water at an exorcism. I suspect this means that I must be doing something right!
Labels: ODF
Sunday, March 01, 2009
ODF Spreadsheet Interoperability: Theory and Practice
As many of you know, neither ODF 1.0 nor ODF 1.1 defines a spreadsheet formula language. They leave it implementation-defined. The specification makes only a few broad statements, such as a recommendation that formula attributes be qualified by namespace, that formulas begin with '=' , that cell addresses be surrounded by '[' and ']' and that formula parameters be delimited by ';'. So in theory, this is a mess. But in practice it has worked out quite well, since implementations have played "follow the leader" and have nearly converged on interoperable spreadsheet formulas. With ODF 1.2, we'll standardize the consensus on spreadsheet formulas, giving even greater certainties.
Let's see how this works in practice. I created a simple spreadsheet document in several ODF-supporting applications, including Microsoft Office using the various plugins. Here is what I tested:
- Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
- Google Spreadsheets
- KOffice's KSpread 1.6.3
- Lotus Symphony 1.1
- OpenOffice 2.4
- Microsoft Office 2003 with Sun's ODF Plugin
I used what I had installed on my two machines, Windows and Ubuntu. There may be updates to some of these applications that do even better.
I created the same basic spreadsheet from scratch in each editor and saved it as ODF format. I then looked at each document to see how formulas were being stored in the XML:
- CleverAge stores it in the OpenOffice namespace (xmns:oooc="http://openoffice.org/2004/calc")
- Google also uses the OpenOffice namespace.
- KSpread doesn't use namespace-qualified formula attributes.
- Symphony also doesn't use namespace-qualified formula attributes.
- OpenOffice uses the OpenOffice namespace.
- Sun's Plugin also uses the OpenOffice namespace.
I took each of the 6 spreadsheet documents and opened each one in each of the other 5 applications -- 30 interoperability tests -- to see whether the formulas were loaded and calculated correctly. Here is what I saw:
| Created In | |||||||
| CleverAge | Google | KSpread | Symphony | OpenOffice | Sun Plugin | ||
|---|---|---|---|---|---|---|---|
Read In | CleverAge | OK | OK | Fail | Fail | OK | OK |
| OK | OK | OK | OK | OK | OK | ||
| KSpread | OK | OK | OK | OK | OK | OK | |
| Symphony | OK | OK | OK | OK | OK | OK | |
| OpenOffice | OK | OK | OK | OK | OK | OK | |
| Sun Plugin | OK | OK | OK | OK | OK | OK |
So the formulas came through OK, in almost all instances. The only exception was the CleverAge add-in, which failed to process formulas from KSpread and Symphony. For example, loading the Symphony spreadsheet into Office 2003 results in cells with contents containing errors such as "=#REF!+#REF!-#REF!" which is tantamount to data loss.
I think we can do better than this with a few simple changes.
The Law of Robustness as stated in RFC 1122 is "Be liberal in what you accept, and conservative in what you send." Adapting that principle to ODF spreadsheets, I recommend the following practice for ensuring interoperability using ODF 1.0 and ODF 1.1:
- When writing ODF 1.0 or ODF 1.1 spreadsheet documents, write formula attribute values using the OpenOffice namespace prefix: "http://openoffice.org/2004/calc". All ODF spreadsheet applications I have tested accept and correctly process formulas in that namespace. Note that the CleverAge add-in is not doing the namespace checks in a XML-correct fashion. They are comparing only the text of the prefix, not resolving it to a namespace URI and comparing the URI's. So you should be sure to also use "oooc" as the namespace prefix.
- When reading ODF 1.0 or ODF 1.1 spreadsheet documents, be prepared to handle formulas with no namespace qualification as well as those with the OpenOffice namespace.
Now, if you are entirely satisfied with what I have said above, and have no lingering doubts, then you are not thinking enough. It is not enough to merely bring the spreadsheet formulas over intact. Interoperability also requires that we interpret the formulas in the same way.
So let's look at that side of the equation (no pun intended). Fortunately, we are all quite close to what is being defined in ODF 1.2's OpenFormula specification. This is not so surprising, since OpenFormula was based on actual spreadsheet practice, looking at a variety of spreadsheet applications. I did a quick test of the 6 ODF spreadsheet applications to see how well they fared against a test suite of 509 core tests that OpenFormula defines for spreadsheet functions. The results were:
- CleverAge 455/509 = 89%
- Google 457/509 = 90%
- KSpread 472/509 = 93%
- Symphony 487/509 = 96%
- OpenOffice 493/509 = 97%
- Sun Plugin 500/509 - 98%
Looking forward, we'll continue to edit and refine OpenFormula and its test cases. You might look for it when it comes out for public review, hopefully in a couple of months. Unlike other parts of ODF 1.2, OpenFormula is essentially XML-free. It is a mini-expression language, defined by a BNF grammar and accompanied by hundreds of spreadsheet functions from mathematics, finance, engineering, statistics, etc. So review by subject matter experts in these disciplines is especially needed, even if they have zero XML experience. If you want to see the current OpenFormula Working Draft, currently in its 71st revision, take a look. Comments may be submitted to the ODF TC's comment list.
I'm also looking forward to testing Office 2007 SP2's ODF support when it comes out, to see how their ODF support is improving. Anything less than the 500/509 results that Excel 2003 gives with the Sun Plugin will be a disappointment. KOffice has a 2.0 version in beta I should look at. OpenOffice has their 3.0 update. Sun also has an updated ODF Plugin. I'll lean on the Symphony team as well, and see if we can beat 500/509. Game on!
Labels: Interoperability, ODF
Wednesday, February 25, 2009
Whither ODF?
Whether ODF will wither or weather
depends on us as we work together.
The question is where we should go: whither?
The answer is clear at once.
The question of "whither" is not so dense,
and is easy to answer when we start with "whence?".
Of the topic today
I will no longer delay nor dither to say
whether we will whither or weather
but will now give you my 2-cents.
Rob's ODF-Next Rant
- The word processor and spreadsheet, as we have them today, are relics of the 1980's, designed when the web did not exist and collaboration occurred predominantly by exchanging paper documents. If we were designing a document author and collaboration system to meet modern circumstances and capabilities, it would likely bear little resemblance to Word. So the question is how much do we let the sunk costs of yesterday continue to determine our future? How much longer do we paint speed stripes on a horse and pretend that it is a racing car?
- Products like Word and Excel have evolved via the uncritical accretion of functionality over the past decades to a point where the products are overly complex resource gluttons with a knack for having a critical security flaw reported in them every other week.
- Increasingly users are getting work done via email, wikis and blogs rather than using heavy-weight document editing solutions. Why is this so? Why is the modern word processor losing users rather than gaining them?
- WYSIWYG is a fine paradigm if you are doing all of your work targeting printed output. But it is a sub-optimal approach for creating documents for almost any other use.
- The revered Bold, Italics and Underline icons, along with the font selection drop down list, which define the modern editor GUI, should be forcibly removed from the user interface, stripped of rank, and put on trial for crimes against productivity. You are writing a document, not decorating a cake. You need to ask yourself "Why should this text be italics?" Is it a book title, a foreign phrase, a name of a movie, the name of a legal case? Then choose a named style that indicates why that text is special. Let the named style take care of how it is displayed.
- Unless you are designing a poster for a modern art gallery you should stick to the named styles in your template. Power users might define additional named styles. But direct application of random attributes to random text selections should be considered a form of data corruption.
- Few documents today are ever printed. The are born, live and die entirely in digital form. We should be optimizing for the most common cases, not just for what our parents or grandparents did with WordPerfect 1.0.
- The most common sources of reused content come from other documents and from PDF and from HTML. Current cut & paste mechanisms today make a mess of styles. Paste in the content with the styles of the source document? According to the styles of the destination document? Mapping to the nearest local style? All are wrong answers. The only correct answer is to give me the choice.
- PowerPoint is pure evil. It has elevated form over substance and turned every form of business communication into a "pitch".
- I should be able to call spreadsheet functions using named parameters, like PV(rate=1%,periods=12,payment=$1000.00) rather than PV(0.01,12,10000) so my model is self-documenting and avoids errors from incorrect ordering of parameters.
- Security needs to be designed into the document authoring environment, including the format, not patched on as an afterthought.
- I want Greasemonkey for my word processor and my spreadsheet.
- Connections between documents may be as important as the documents themselves.
- The less control the user asserts over the appearance of a document during editing, the more flexibility he or she has over the final published appearance. In today's multi-modal, multi-device world, it is essential that we do not prematurely commit our documents to a particular rendering. We need late binding of presentation to content, not early binding. If we had done this for the past decade, we would have perfect interoperability today between all word processors. If we start doing it now, we will have perfect interoperability among word processors going forward.
- Spreadsheets should have functions that access web-based data stores for common financial, economic, political and scientific data sets. Mathematica does something similar, presumably using local caching.
- Presentation should be a mode of displaying another document, not just document type itself. For example, I should be able to take a report and push a button to enter a slide-show mode, where all images are shown as slides, with their captions, and each top level section header becomes a slide with 2nd level headers as bullet items. During the presentation I should be able to seemlessly drill down into the real document.
- I want to be able to share data ranges, text ranges and presentation slides with others and to subscribe to theirs via feeds. I rarely write a document from scratch. Reuse, reuse, reuse. But the tools only support this at a scavenger level.
- We lack high level support for the compositing or assembling a document from fragments. Once I cut & paste, my new docment has lost all knowledge of the document I copied from. This is great if I am a professional plagiarist. But it is bad if I am a CIA analyst and my report has copied information claiming uranium production in Africa, and I never know when that information is repudiated, and I pass my flawed report onto the President. Very bad. When I cite an authority for an argument, my argument is only as good as the authority. I owe it to myself and my readers to make it easy to know whether the information I cited is still accurate and vouched for by that authority.
- Current tools are impoverished when it comes to the social side of documents. Review/comment reflects old, hierarchical thinking and doesn't scale to the network. How can I have 100 people comment on my document? What if I want 100 people to jointly author a document? The Wiki knows where Word cannot go...
- Most user woes in modern word processor are caused by our attempts to remain compatible with the design choices made by Microsoft Office developers 15 years ago. It is time to move on and learn from past mistakes, but not perpetuate them.
- I want to use the same text editor to edit documents, web pages, emails, blog posts, discussion forums and wikis. Why do I need a different brand hammer for every nail?
- I want a spreadsheet function that can call a web service. It might lookup a book title by ISBN, do currency conversions, or geocode data. There should be thousands of such spreadsheet functions, backed by web services, interoperable based on standard protocols. Some might be free, others fee-based. Some might be both, e.g., 20-minute delayed quotes for free, real-time for a fee.
- Spreadsheet functions express a core analystic function and should be usable in all tables, in word processors and presentations, not just in spreadsheets. They should also be usable in fields in forms and in text passages.
- The inability of word processors to output clean, readable and valid HTML or XHTML should be an embarrassment to all vendors.
- HTML + JS + XHR + HTML DOM = AJAX. ODF + JS + XHR + ODF DOM = ?
- We must define power as in "power user" based on results, on productivity. Power is as much about what a system allows you to ignore as what it allows you to control.
- Today trust is based on digital signatures and classical questions of authentication, integrity and non-repudiation, all backed by a chain of trust traceable back to some well-known certification authority. In some contexts, this hierarchical, binary view of trust is adequate. But the network sees trust based on reputation, rating, scoring, voting, reverse citation counts and other non-hiearachical values. How do we account for these?
- Spreadsheets are unnecessarily dangerous, based on a muddled view of data types which leads to silent errors and inconsistencies. This might have made sense in the memory and processor constrained systems of the 1980's. But today, with our better sense of the errors and the cost of errors, we need a spreadsheet system that is type-safe, aware of measurement units, and which enforces consistency and accuracy. We obviously can't prevent someone from making a stupid spreadsheet model for subprime mortgages, but we can at least ensure that they don't make stupid cut & paste errors when creating that model.
- Spreadsheets should have instrinsic support for image, sound and geographic data. Not just embedded media, but as an intrinsic data type, so a function could take an image as input, or return an audio clip as a result.
- A grid in a spreadsheet provides a logical addressing scheme as well as a visual layout scheme. But what if I want the former without the latter? Why can't I do a spreadsheet calculation in a text document? Why am I always stuck in in a grid?
- Spreadsheets should have built-in support for sensitivity and risk analysis, perhaps via monte carlo methods. Yes, I know support is available via 3rd party plugins, but this should be a core feature in the repetoire of every user. We might not be in the global financial mess we're in now if spreadsheet users all could easily "stress test" their models.
- The Holy Trinity of Word/Excel and Powerpoint is only a convention, mainly enforced by Microsoft's definition of their office suite. It is not a law of nature. Other applications types should be considered to be part of the core desktop authoring environment, such as project management and mind maps.
- Outliners and other pre-draft tools have lagged far behind the core editing functions of a word processor. And what is the equivalent of an outliner for a spreadsheet?
- Microsoft is as much a prisoner to the predominent model of end user producitivty as the user is. Their need to support legacy documents constraints their freedom of action and has contributed to the relative lack of innovation in Microsoft Office over the past decade.
- An editor should allow a user to verify interoperability as easily as it lets them do a print preview.
Labels: ODF
Sunday, February 22, 2009
Looking for Good Ideas for ODF-Next
One thing I learned early in my career was how wasteful this kind of project cycle is. The problem is that not everyone is involved in every part of the project. Some only work on planning, some only on execution, and some mainly come in at the end. This leads to suboptimal allocation of resources. People are standing around waiting.
One solution, not necessarily the only one, is to work on multiple versions of a project at once. For example, when working on a software application, you can take 25% of your team and have them start the planning phase of version N+1 while the remaining 75% of the team completes the final QA stage of version N.
We have a similar issue with standards development. Both the OASIS and the JTC1 PAS process involve a lot of standing around waiting: at least two months of public review in OASIS, and 6 months of review in JTC1. And even now, as we complete the editing work on ODF 1.2, the wider ODF community is standing around waiting. It is too late to make feature proposals for ODF 1.2, but too early for a full public review of the ODF 1.2 draft.
What is to be done?
The ODF TC has decided to begin activities on the next version of ODF, called for now "ODF-Next", even before we have ODF 1.2 approved. Although we obviously won't be spending a large amount of time on that effort quite yet, since we really are all busy with ODF 1.2, we have come up with a way to engage the broader community and have you help us gather requirements for ODF-Next now, which we can then consider during the downtime when ODF 1.2 is under review in OASIS and JTC1. The Call for Proposals for ODF-Next went out on Friday.
So put on your thinking cap. ODF 1.1 and ODF 1.2 were incremental releases. Maybe ODF-Next will be bolder, maybe something that shifts the paradigm, pushes the envelope, breaks out of the box. Is the dominant WYSIWYG word processing paradigm the final word in user productivity? Or are we overdue for a change, for a different set of priorities? As Thomas Paine wrote, "We have it in our power to begin the world over again."
Now is the time to start collecting the ideas, big or small, and submit them to the ODF TC according to the instructions in the Call for Proposals linked to above.
We'll be collecting ideas at least until March 31st. The Requirements Subcommittee will then sort through the ideas, categorize and prioritize them, and generally try to make sense of it all, and then write up an ODF-Next Requirements document with their recommendations.
This is a good chance to get your ideas in early and have a real impact on where we go with ODF in the next major release. But please, do not give me ideas via blog comments. We can only accept ideas sent through the above linked OASIS comment submission procedure, which is necessary to ensure that ODF remains an open standard that anyone can implement. IANAL, but I believe an added benefit is that any idea you submit, even if speculative, even if not added to ODF-Next, will be permanently archived in the ODF comment list, and thus will establish prior art which could scuttle attempts to secure patents in this area. So by contributing your ideas publicly in this way, you help to establish an intellectual commons that will benefit free and open source applications in this area.
Please pass along the word. We're hoping to get 100's of ideas for ODF-Next. Bring it on!
Labels: ODF
Saturday, February 21, 2009
Strange corners of the Web
At first I listened only to the big broadcasters like the BBC Word Service, Deutsche Welle, Radio Moscow, and then moved on to smaller ones: Tirana, Malta, South Africa, etc. It was a great way to get a global perspective beyond the 2-minutes allocated to international news on a typical US-based evening news program.
Eventually I started writing the broadcasters and received many QSL cards. Some of my letters were read on the air. I'm sure I ended up on some FBI watch list for those letters to Radio Prague and Radio Havana. My subscription to Soviet Life magazine, and a Cambridge address probably didn't help either.
But you don't go far as a SWL before you notice that there are a lot of strange things going on in the aether. Some were easily explained -- the Soviet Union jamming broadcasts of Voice of America or Cuba jamming broadcasts of Radio Martí. And then there were the commercial voice broadcasts, ship-to-shore, international aviation, time signals, etc. Then the various data services, radio teletype, weather fax, etc. And then there were the mysterious coded transmissions, which we rumored to be SAC tranmissions, "Sky King, Sky King, Do not answer", followed by various authentication codes, which were either recall or go ahead codes for nuclear attack. It was an eerie feeling, in the hotter days of the Cold War, to lay awake at night, listening to the radio and wondering whether the sun would rise in the morning. Now I just wonder if my 401(k) will still be there.
Stranger yet were the cryptic transmissions of the "numbers stations", which would transmit on a semi-regular schedule and merely read off a large list of numbers for 10 minutes. For months I transcribed one particular woman's transmissions, trying to find out the pattern. I did some computer analysis, but the numbers were random in frequency, with no discernible patterns. Presumably they were encoded against a one-time pad.
And then there were the "pirate" radio stations like "The Voice of the Purple Pumpkin".
Although most people knew about the BBC World Service, I don't think many appreciated that a large portion of the shortwave universe was strange, that the fringe was everywhere.
I'm starting to have a similar view of the web. Their are major content providers, minor content providers, even individual content providers like me. And then their is the weirdness, the strange corners of the web, the space between the channels, where you are not even sure you are listening to signal or noise.
Here are a few random examples of web sites with no discernible purpose. They appear to be garbled republications of new stories.
Let's start with the "Wet Paint Body Notes" blog, newly created, with only three posts. One is called "Microsoft Gets Foot in Mass. Office Door". It starts:
In what could be a coup inwardly favour of Microsoft (Nasdaq: MSFT) and a biff to the friendly wellspring league, the stipulate of Massachusetts personal added Microsoft's Office Open XML norm to its document of give your declaration standards it will allow for elected representatives exploit.
This is a strange kind of English. It almost seems like a poor translation, or even a poor machine translation, of a document written in another language. But if you poke around a little, you find the this blog post is an unattributed garbled derivation of a 2007 article in Linux Insider. Not only was the original article in English, the reposted version truncates the article, posting only the first few paragraphs.
So what's up with that? There are no banner ads or other obvious sources of revenue on the garbled version of the article. It is not a link farm. In fact it has no outgoing links. So why did someone bother?
Another example. The blog "75Software-News48" has an new article "Microsoft shows support for ODF", posted just two weeks ago, with the intro:
Amid organization hassle surrounded by wish of interoperability, Microsoft (Nasdaq: MSFT) protected Thursday announced the discovery of the Open XML Translator Project. The overhang will fry in the air permitted software to allow Word, Excel and PowerPoint to knob documents in contrary technology format.
Again, this reads like it is a poor translation from another language. But look further and you can find that the original article is actually in English, from a 2006 TechNewsWorld article.
Again, no obvious intent here. It isn't a link farm, and there is no evident source of revenue. It isn't informative and it certainly isn't timely. So why did they do it?
One more example this time a LiveJournal blog called "All Microsoft", again newly created, with a post called "Ecma Approves MS Office Format, IBM Dissents". It opens:
Microsoft's (Nasdaq: MSFT) Open XML bureau software format, broad of via the tech giant to chase near the Open Document Format (ODF), cleared a standards hurdle this week, successful approbation from the Ecma global standards article.
Same modus operandi here. Original source, unattributed, is from a 2006 Linux Insider article.
I have dozens of examples of this kind of thing, all within the last couple of months, mainly articles about Microsoft and ODF. Something new is afoot. But what? Anyone have any idea of what this is and who benefits from it? If this just a contest between Blogger and LiveJournal to see who can claim the most hosted blogs? Or is it some SEO ploy? It has me stumped.
Tuesday, February 17, 2009
ODF 1.2 Committee Draft 01
A Committee Draft (CD) is the first step toward finalizing ODF 1.2. The TC will likely approve further CD iterations before voting to approve one as a Public Review Draft. The Public Review Draft, as the name suggests, will be what we send out for a public review of at least 60 days. We can then make changes based on review comments and hold additional public reviews if we make non-trivial changes to the Public Review Draft. The ODF TC can then vote to approve the draft as a Committee Specification. We then hold a further vote to send the Committee Specification out for an OASIS-wide ballot (not just the ODF TC, but all OASIS members) on whether to approve ODF 1.2 an OASIS Standard. Once that is done, we can then start the PAS approval cycle in JTC1.
Although there are a lot of votes and process steps remaining, the major technical work is just about done. What remains is a period of review, perfecting the text, gaining implementation experience and feedback, etc. Some may call this a "death march", but I see this pace as consonant with the importance of our activity and our deliverables. Work in OASIS might not be as fast as Ecma, where you can evidently create a 6,000 page standard in less than a year. Our process calls for a bit more than the IETF's "rough consensus and running code." But neither are we the slowest process in the standards development landscape. We're some place in the middle. And when we're talking about revising an open document format, already adopted and used by governments around the world, I am not ashamed to say that we're working deliberately and carefully.
We also need to socialize and grow consensus around ODF 1.2, both from implementers, but also adopters and consumers of ODF. There is still work to be done here. For example, the TC vote on the Committee Draft 01 was not unanimous. We did not have the support of Microsoft or Novell. There are still disagreements over how we define conformance in the standard. We obviously need to continue discussing this topic. Since the final TC vote to request an OASIS Standard ballot requires 2/3 approval of TC members with no more than 25% disapproving, we'll need a high level of consensus in the TC to move forward, including, hopefully, the support of Microsoft and Novell.
Implementation experience is important in OASIS. I know some have criticized OpenOffice for having support of draft ODF 1.2. But this support is a good thing, in my opinion. We need implementers to validate the design decisions we've made in the standard, to ensure that our choices are reasonable, that we haven't missed anything. We're working in an engineering discipline. We're not making abstract standards for the mind alone. Engineers build, test and refine. It is what we do. In fact, OASIS requires that before a Committee Specification can be nominated for an OASIS Standard ballot, the TC must certify that there are three conforming implementations of the Committee Specification. So not only are early implementations a good idea, they are required as part of the process.
If you are asking, "How can I help?", then here are a few ideas:
- If you are an implementor of ODF 1.0 or ODF 1.1, then now is a good time to start looking at what is required to add ODF 1.2 support. Download the CD of ODF 1.2, but also look at this page for a summary of changes. We'll formalize that list of changes and put it into a appendix of the draft, but this wiki page should give you a good feel for what areas have been touched.
- Although we have not yet approved a Public Review Draft specifically for public review, we welcome comments at any time. You can send comments on ODF 1.2 CD 01 according to the instructions on this page. Download the draft, pick a chapter of interest and send us any errors you find.
- We should start thinking ahead to how we can encourage a thorough review of the eventual Public Review Draft. I want to avoid the OOXML-fiasco where Ecma approved and sent to JTC1 a half-baked, deeply-flawed text. What can we do to give ODF 1.2 a really hard scrub in the OASIS review period, so what comes out meets the high standards we should expect from an international standard? I think we've done a good job in drafting ODF 1.2 and I want to encourage scrutiny, not shy from it. But let's have this scrutiny earlier rather than later.
Labels: ODF
Friday, February 06, 2009
The 21st ODF Toolkit Scenario
I'd like to augment that list with a new pattern of use, a clever idea suggested to me by Jomar Silva in an email quite a while ago, but an idea which I just recently warmed up to. I believe this technique could be quite powerful and should take its place as the 21st scenario for any ODF Toolkit.
It goes something like this:
If you have a toolkit written in a language, say Java, and the toolkit has API's which you can use to both read and write ODF documents, then you can write a program that will read an ODF document and write out the Java code that would be needed to re-create that same ODF document. So it is a code generation pattern. Java code reads ODF and writes source code for Java program that can then be compiled to write ODF.
This is very useful in a number of situations. For example, you can design your document in a familiar tool, like your word processor. Get all of the styles and layout correct and then run the code generator to generate the Java source file. Then hand-edit the source code to make changes, such as substitutions, insertions, looping to copy content down a row, etc. You could even adopt a place-holder convention in your original document, to make it easier to find the areas that you wanted to replace. For example "REPLACE-FNAME" and "REPLACE-LNAME" might be be a good place-holder.
Of course, this idea is of general applicability, not just limited to ODF. It could be applied, and for all I know has been applied to HTML, etc.
Labels: ODF
Thursday, February 05, 2009
I love the smell of ODF in the morning
First up, Jomar Silva brings us the happy news that Venezuela now mandates the use of ODF, joining Uraguay, Brazil and 14 other national governments that have adopted the International Standard for office documents.
BrowserShots.org has been part of my web design toolkit for some time now. It allows me to easily test a web page to see how it renders on a wide range of browsers and platforms, without having to personally maintain a dozen different machine and configurations on my desk. You enter a URL and click off which of 50+ different browser versions you want your page rendered on. The system then queues up your requests, farms them out to various machines that render the pages and return screen shot images (PNG format) of the results. You get some results almost immediately, while others might take 30 minutes.
I've recently received news that this same concept is now being applied to ODF documents in a new project called OfficeShots. Funded by the Dutch government and the OpenDoc Society, this project (not quite yet ready for beta) will:
[H]elp you make a better choice by letting you compare the output and other behavior of a wide variety of applications. Does your corporate style - the technical basis for many documents - actually look consistent across the board of applications - from OpenOffice.org 3.0, Adobe Buzzword and Symphony 1.2 to Microsoft Office 2000 with the ODF addin from Microsoft - or the one from Sun Microsystems? And how does it look on Mac OS X in iWork? When you are in an acquisition phase, officeshots.org will help you do a reality check if that fancy new open source suite or that productivity package you can get a bargain deal at - actually does what it says. On the spot.
This is a great idea and I look forward to seeing it in operation.
Finally, if you also have some ODF project ideas, then be sure to note that the NLnet Foundation has named ODF as one of its two focus areas for 2009 and that they are accepting project proposals for funding. So get out that digital pencil and start writing down ideas.
Labels: ODF
Monday, January 26, 2009
The State of ODF in OASIS
It was a good year in OASIS as well, for ODF. The ODF TC, which I co-chair, created a new Subcommittee to investigate ODF-Next requirements, and we created a new OASIS TC, to join with the existing ODF TC and ODF Adoption TC, to work on "Interoperability and Conformance". We also saw a substantial increased in participation in the ODF activities, spurred by the increased demand for ODF and the increased maturity of ODF implementations.
A few statistics you might find interesting on the level of participation in OASIS TC's related to ODF, based on a tally I did this morning:
- The three ODF TC's have 81 members from 28 corporations/organizations, as well as 9 individual members. This count does not include the even larger number of OASIS members who are "observers" in these TC's.
- Large companies with participants in these TC's include IBM, Google, Sun, Microsoft, Nokia, Oracle, Intel, RedHat, etc. a virtual "Who's Who" of the tech sector.
- Members reside in 13 different countries.
- 16 TC members are also members of their JTC1 or JTC1/SC34 NB's. A total of 7 NB's currently have members in the ODF TC's.
- The TC's and SC's had 95 meetings in 2008 and their current schedule calls for a combined 10 hours of teleconferences per month.
- The main ODF TC had 439 person-hours of meetings in 2008.
- The mailings lists for the TC's received 2,594 posts in 2008, including 95 agendas and 95 meeting minutes.
- ODF's public comment list received 603 comments in 2008.
Monday, January 12, 2009
ODF 1.0 Errata 01
Liaison Statement from JTC1/SC34 to OASIS ODF TC
Defects have been identified in ISO/IEC 26300 and defect reports will be submitted to the OASIS ODF TC.
SC 34 requests that the OASIS ODF TC respond to these defect reports in a timely fashion and publish errata in accordance with OASIS procedures.
SC 34 requests that the Project Editor of ISO/IEC 26300 submit draft technical corrigenda consistent with OASIS approved errata conforming to ISO requirements for SC 34 ballot.
However, a defect report was not submitted by SC34 until seven months later, when a formal defect report (N0942) was eventually submitted.
I'm pleased to report that the OASIS ODF TC has created and approved a response to this defect report. The official announcement is here.
You won't find any substantive changes to the standard. The document mainly addresses trivial editorial errors. No implementation will need to change because of these errata. Some might argue that it is a complete and utter waste of time to make editorial changes to a standard when they can have no effect on implementations. And this is true, up to a point. But there is always the possibility that a minor grammatical or spelling error might, when the ODF standard is translated into another language, be transformed into a more substantive error. So, perfecting the text of a standard, even 4 years after publication, does serve a minor purpose and deserves proportionate attention.
Since several members of JTC1/SC34 have expressed a strong desire of keeping the OASIS and ISO/IEC versions of the ODF in sync, I'm sure they will be eager to turn this errata document into technical corrigenda for approval by SC34, now that OASIS has done what was asked of it, i.e., "published errata in accordance with OASIS procedures." The ball is in their court now.
Labels: ODF
Friday, October 31, 2008
ODF Update
As many of you already know, standards maintenance consists of two main activities:
- Defect removal through the issuance of corrections to published standards (variously called "errata" or "corrigenda", depending on your zodiacal sign)
- Revision, through the issuance of updated (and presumably improved) versions of the standard.
On the maintenance side, Wednesday 29 October saw the start of a 15-day public review for draft 3 of the ODF 1.0 Errata document. The official OASIS announcement has more information on the public review, including links to the errata document itself, as well as how the public may submit comments. JTC1/SC34, though their Secretariat, has also been invited to participate in this review.
Once the public review has concluded, and assuming that no new issues surface in the review, the ODF TC may approved and publish it as "OASIS Approved Errata" as well as transmit the text to JTC1/SC34 for application to ISO/IEC 26300.
On the revision front, the TC continues to work to complete ODF 1.2. But while finishing that revision, we decided that we also want to initiate a new activity related to the next version of ODF, the one after ODF 1.2. We did not have immediate agreement on what that version would be called (ODF 1.3? ODF 2.0?) so we started calling it "ODF-Next". We voted to create a new Subcommittee of the ODF TC, called the ODF-Next Subcommittee to start preliminary background work on this next version, in parallel with the TC's foreground task of completing ODF 1.2. The charter of the new subcommittee reads:
Statement of purpose
--------------------
As the ODF TC completes its work on ODF 1.2, it is desirable to instantiate a parallel effort to gather requirements and define a vision for the next major revision of the standard.
It is the purpose of the ODF-Next Requirements Subcommittee to gather requirements, to categorize these requirements by theme, to prioritize these requirements, and to submit a report to the ODF TC on a recommended set of work items for the next major version of ODF, which will have the working name of "ODF-Next".
Scope of work
-------------
In accordance with the above Purpose, the ODF-Next Requirements SC would undertake the following activities:
To collect requirements for ODF-Next from TC members, from the OASIS ODF Adoption TC, from implementors, from users, from the public, and from other stakeholders;
To ensure that all requirements collected have been formally submitted as contributions to the ODF TC, either as TC member contributions or via the Feedback License;
To categorize these comments according theme;
To prioritize the themes and the requirements within the themes;
To produce and submit to the ODF TC a report on a recommended set of work items for ODF-Next
Bob Jolliffe, from the Department of Science and Technology, South Africa, has agreed to chair the Subcommittee. We had our first meeting last Tuesday.
I think this is going to be exciting. ODF 1.0 and ODF 1.1 was about mainly about encoding, in an open standard, the output of conventional productivity applications. If you are a conventional person, running a convention business, with conventional ideas looking for a conventional profit, then great, don't let me wake you up. But I think we need to do more than that. Achieving mere conventional doesn't get me out of bed in the morning. If I wanted to just replicate what others were doing, I'd join the Mono project.
ODF 1.2 starts to break away from that conventional view with its richer view of metadata. But with ODF-Next, we can pull significantly ahead and move into uncharted territory. As Thomas Paine wrote, "We have it in our power to begin the world over again."
As you can tell, from reading the charter, our primary initial task will be to collect feedback for feature ideas for the next release of ODF. When we formally put out the call for comments, I expect a huge response. So our initial TC meeting was mainly spent discussing ways in which we can can handle a large volume of public comments, in terms of collection, categorizing and prioritizing. Once we agree on a tool to use, and set up some infrastructure to handle the load, expect to hear more on this blog, and elsewhere, about how you can submit your ideas, and help define the capabilities of the next version of ODF.
Next, I'd like to note that the OASIS ODF Interoperability and Conformance TC (OIC TC) met for the first time last week (and a second time again this week). We elected Bart Hanssens of Belgium as Chair of the technical committee. Bart works for Fedict, the Belgian federal ICT agency, one of the early adopters of ODF. Companies represented on the TC include IBM, Sun, Novell, Google, Oracle, Red Hat, Sursen, Ars Aperta, and the US Department of Defense. We also have a number of individual members.
The greatest difficulty in our initial call was determining a schedule for future meetings. With participants spread out from California to Boston, Paris, Hamburg and Beijing, there is no time which is going to be easy for all of us. The best we could come up with was to meet at 1430UTC, corresponding to 0930 EST, 1530 CET, 2230 China, but 0630 PST (ouch).
In any case, the OIC TC discussions are flowing well, as we start to discuss how we engineer test cases, what data to collect for them, how to encode test metadata, etc. You can follow the discussion in the public archives of the TC's mailing list, or even better, consider joining OASIS ($300 for an individual membership) and participate in this or any other OASIS Technical Committee.
Finally, the ODF Adoption TC has been busily preparing to host a panel discussion and workshop related to ODF interoperability at the OpenOffice.org Conference in Beijing next week. In fact, I should now stop procrastinating and get back to completing by presentations!
If you add it all up: the three ODF-related TC's (ODF TC, ODF Adoption TC, ODF Interop and Compliance TC), we have a combined 79 members, of which 68 represent 25 different OASIS corporate or organizational entities, and the remaining 11 are individual members.
-Rob
Labels: ODF
Sunday, October 12, 2008
ODF @ OOoCon 2008
Why? Because I'm attending the OpenOffice.org 2008 Conference in Beijing, November 5th-7th. Since I'll miss election day, I'm submitting an absentee ballot, and in fact I've just filled it out. I predict a great increase in personal productivity from being able to sit out the remainder of the minute-by-minute saturation campaign coverage.
This will be my third OOoCon. After Barcelona last year and Lyon in 2006, the organizers this year have a tough act to follow. But from what I can see, this year is shaping up to be the "best ever", with open ceremonies at the Diaoyutai State Guesthouse (former residence of Madame Mao) and a conference sessions at Peking University.
Although the focus of the conference is OpenOffice.org, the program, the developers, the translators, promoters and users, there is also a natural overlapping interest in OpenDocument Format (ODF). Because of this, OOoCon typically is also the largest ODF conference of the year, at least based on number of ODF-related sessions.
In particular I'll draw your attention to the following ODF-related sessions:
- Interoperability -- expectations, promises, problems and solutions (Florian Reuter)
- OpenOffice.org and the ODF Ecosystem (Dieter Loeschky)
- Panel Discussion -- ODF Interoperability Perspectives (with representatives from IBM, Sun, Google, Novell, FEDICT, moderated by Aslam Rafee of DST)
- ODF@WWW -- An ODF Wiki (Kay Ramme)
- OOo and ODF Accessibility (Malte Timermann)
- The New ODF 1.2 Metadata Framework and its Support in OpenOffice.org 3 (Svante Schubert)
- ODFDOM -the new open sourced multi-tiered API for ISO OpenDocument Format (Svante Schubert)
- ODF Accessibility: Perspectives Past & Future (Don Harbison)
- Introduction to SMIL and Implementation in Lotus Symphony (Yan Peng Guo /IBM)
- Transforming and OWL Ontology to an OpenOffice Document Template (Massoud Toussi)
- Improving ODF Applications by sharing ODF tests (Svante Schubert)
- Enabling ODF for Social Collaboration with Composite Applications and Mashups (Santosh Kumar)
- ODFDOM Workshop -- using the new opensourced multi-tiered API for ODF (Svante Schubert)
- Digital Signatures: A Global Challenge (Joachim Linger)
I hope to see many old and new friends in Beijing. This is a great opportunity to continue spreading the message of open source and open standards around the globe.
Labels: ODF
Thursday, September 25, 2008
Introducing the ODF Interoperability and Conformance TC
Years ago, but not so very long ago, when XML still had that new car smell, two companies, let's call them Red and Blue, decided to make a new XML-based standard. This new standard would be, they claimed, a huge step forward and would increase interoperability, especially in complex heterogeneous environments, with multiple operating systems, multiple vendors and applications, etc. Their activities received much fanfare in the press. Everyone was pleased that Red and Blue were cooperating together to make this new standard.
This wonderful new standard was eventually completed, and Red and Blue both went and implemented the standard in two implementations which I'll call RedLib and BlueLib. But when they tried running their RedLib and BlueLib implementations against each other, to demonstrate interoperability, it didn't work. It was a total failure. There was zero interoperability.
So what did Red and Blue do? They realized that interoperability is not guaranteed merely by the existence of a standard. You also need high quality implementations, implementations that accurately and completely implement the standard. For any non-trivial standard, implementation errors will dominate the list of causes of interoperability problems. So Red and Blue worked together, with other vendors, to create an interoperability lab for the new standard, and created test suites to test interoperability, and held interoperability demonstrations at conferences, and tested and iterated on this until the implementations provided a high level of interoperability.
Today billions of dollars are transacted every day using this XML-based standard.
With ODF we find ourselves in a similar, though more complex, situation. There are more vendors involved than just Red and Blue. We are starting with many commercial and open source implementations. In some cases, with some editors, interoperability is quite good. In other cases it is rather poor. But when a user loads a document, which they may have downloaded on the web, or received via email, they have no idea where that document came from, what application, what operating system. And when you create an ODF document, you may not know who will eventually read it. It isn't enough to have good interoperability between some ODF implementations. We need good interoperability among all ODF implementations.
From a technical perspective, this is a goal we all know how to achieve. It has been done over and over again throughout the history of technology standards, especially network standards. You develop test suites, you test your implementations against these test suites, you have interoperability workshops (or plug-fests as they are sometimes called). You iterate until you have a high level of interoperability.
For the past 6 months I've been talking to my peers at a number of ODF vendor companies, to fellow standards professionals in OASIS, to ODF adopters, as well as to people who have gone through interoperability efforts like this before. I've given a few presentations on ODF interoperability conferences and led a workshop on the topic. I led a 90-day mailing list discussion on the ODF interoperability. Generally, I've been trying to find the best place and set of activities needed to bring the interested parties together and achieve the high level of interoperability we all want to see with ODF.
The culmination of these efforts is the creation of a new Technical Committee in OASIS, called the ODF Interoperability and Conformance TC, or OIC TC for short. The official 30-day OASIS Call for Participation went out last Friday. You can read the full charter there, but you can get a good idea by just reading the "Scope of Work":
- Initially and periodically thereafter, to review the current state of conformance and interoperability among a number of ODF implementations; To produce reports on overall trends in conformance and interoperability that note areas of accomplishment as well as areas needing improvement, and to recommend prioritized activities for advancing the state of conformance and interoperability among ODF implementations in general without identifying or commenting on particular implementations;
- To collect the provisions of the ODF standard, and of standards normatively referenced by the ODF standard, and to produce a comprehensive conformity assessment methodology specification which enumerates all collected provisions, as well as specific actions recommended to test each provision, including definition of preconditions, expected results, scoring and reporting;
- To select a corpus of ODF interoperability test documents, such documents to be created by the OIC TC, or received as member or public contributions; To publish the ODF interoperability test corpus and promote its use in interoperability workshops and similar events;
- To define profiles of ODF which will increase interoperability among implementations in the same vertical domain, for example, ODF/A for archiving;
- To define profiles of ODF which will increase interoperability among implementations in the same horizontal domain, for example ODF Mobile for pervasive devices, or ODF Web for browser-based editors.
- To provide feedback, where necessary, to the OASIS Open Document Format for Office Applications (OpenDocument) TC on changes to ODF that might improve interoperability;
- To coordinate, in conjunction with the ODF Adoption TC, Interop Workshops and OASIS InterOp Demonstrations related to ODF;
- To liaise on conformance and interoperability topics with other TC's and bodies whose work is leveraged in present or future ODF specifications, and with committees dealing with conformance and interoperability in general.
- Robert Weir, IBM
- Bart Hanssens, Individual
- Dennis E. Hamilton, Individual
- Zaheda Bhorat, Google
- Charles-H. Schulz, Ars Aperta
- Michael Brauer, Sun Microsystems
- Donald Harbison, IBM
- Alan Clark, Novell
- Jerry Smith, US Department of Defense
- Aslam Raffee, South Africa Department of Science and Technology
The OIC TC will have its first meeting, via teleconference, on October 22nd. At that point members will elect their chairman.
I'd like to see broader representation in this TC's important work. In particular, I'd like to see:
- Additional vendors that support ODF, such as Corel and Microsoft (and yes, before you ask, I have already extended a direct and person invitation to Doug Mahugh at Microsoft)
- A representative from KOffice
- A representative from the OpenDocument Fellowship, which has already done some work on an ODF test suite. Wouldn't it be good to combine our efforts?
- Representatives from non-desktop ODF implementations, e.g., web-based and device-based.
- Broader geographic participation.
- Participation with specialized skills to help define and review test cases in areas such as: Accessibility, East Asian languages, Bidi text, etc.
- People with an interest in archiving, to help to define an ODF/A profile.
We have a lot of work to do, but now we finally have a place where we can get the work done. This is big. This is important, both for ODF vendors and ODF users. I hope you'll join us as we all work to improve interoperability among ODF implementations!
[Update: On 12 November Doug Mahugh accepted my invite and announced that Microsoft would join the TC.]
Labels: Interoperability, ODF
Sunday, September 21, 2008
ODF: Translations and Errata
Why is translation important? Aside from increasing the number of developers who can read the standard in their native language, translation is a prerequisite in several countries in order to make ODF into a national standard. So translation increases the number of places where ODF support can be an official requirement. So far the ODF 1.0 standard has been translated into Russian, Chinese, Spanish and Portuguese. (There may be others — Let me know if I've missed any.)
(Interesting to note the size advantage of ODF compared to OOXML. I've heard from one reliable source that to translate OOXML would cost $500,000. This will certainly hamper its ability to be adopted in some parts of the world. ODF, by reusing existing standards, is only 1/10 the size.)
Also in progress is a translation of ODF 1.0 into Japanese. From what I understand, a JISC committee has completed an initial pass of the translation and then passed the translation off to a second committee. This second committee is reviewing the translation and raising any issues where the text is unclear. In some cases this may be caused by a faulty translation. But in other cases errors may be found which were present in the original English text.
That's the second ongoing activity related to ODF 1.0 — error correction. Although we received most of our comments during the mandated 60-day public review prior to approval as an OASIS Standard, we do continue to get a trickle of comments months and years after publication. Each OASIS TC has their own mailing list for receiving comments. For the ODF TC, the mailing list archives are here. Anyone can subscribe to the comment list and post using the instructions here. The additional complexity in the sign-up procedure compared to your average mailing list is to ensure that all feedback submitted by the public to the list is in accordance with OASIS IPR rules. This helps ensure that ODF remains an open standard, unencumbered by patents.
Although we are only obligated to address comments received during the pre-approval public review period, around a year ago the ODF TC decided to formally record and process all comments received, regardless of when they arrived. So far, from May 2005 to the present, we've received around 250 comments. We note each comment in a spreadsheet, along with what ODF versions it pertains to (ODF 1.0, ODF 1.1 or ODF 1.2 draft), what section number the comment concerns, and whether the comment is reporting an editorial error, a technical error, or proposing a new feature. My estimate is that 50% of the comments are feature proposals, 40% are reporting editorial errors, and 10% reporting technical errors.
The preeminent source of comments on ODF 1.0 has been Murata Mokoto, of the Japanese SC34 mirror committee. Murata-san relays to us the defects found during the Japanese translation of ODF. The vast majority of these are editorial errors, mainly typographical or grammatical. But there are a handful of more significant issues found, and we are especially pleased to receive reports of these.
You may recall the old saying, "Every new class of users finds a new set of defects". Translation of a standard is a laborious process, especially when combined with the additional review step that JISC is engaging in. This has subjected the text of ODF 1.0 to more scrutiny, at a more detailed level, than any typical technical review could provide. So I am appreciative of the detailed comments from JISC, and of the effort made in this translation by them.
My personal aim is to ensure that all of the reported editorial errors are fixed in the ODF 1.2 text, and that any technical flaws are addressed via errata. An errata document (That's what we call it in OASIS. Others, e.g., ISO, call it "corrigenda") allows us to make small changes to the ODF 1.0 text to address defects.
But this goal certainly debatable. Why not aim to fix every reported error in ODF 1.0 via published errata? Why knowingly leave even the smallest typographical error in the text? What relative priority should be placed on fixing typographical errors (and others) in ODF 1.0 versus work completing ODF 1.2?
This is entirely at the will of the ODF TC. The combined priorities of the vendors and other interests represented on the committee determine the direction we take. My perception of the expressed interests is that we should address the JISC comments via an errata document, but that the overall priority is on completing the work on ODF 1.2, and not attempting to fix every last instance of subject/verb disagreement or misuse of "A" for "An" in ODF 1.0.
And so our work on the ODF TC follows that priority. I'd estimate that we spend 80% of our time on ODF 1.2 topics and 20% on processing public comments on ODF 1.0/1.1, including those from JISC. We are nearing completion of an official Errata document for ODF 1.0, consisting of fixes to defects reported by JISC. Expect to see a call for public review soon. After that, the TC will continue to review and process public comments from the comment mailing list. If warranted, we are able to issue an updated errata document in the future, to address additional issues as they are reported.
Labels: ODF
Thursday, July 17, 2008
What is Rick smoking?
If you like unintentional humor, you will enjoy reading Rick's over-the-top post.
Rick suggests that organizationally JTC1/SC34 is a more participatory environment for developing standards than OASIS.
JTC1's process, based on National Body voting is both effective ... and more genuinely open, because it is impossible to stack either directly or indirecty.
Let's test that proposition. Let's compare OASIS and JTC1/SC34.
Who can participate? In OASIS, anyone can participate, from any company, organization, government agency, non-profit corporation in the world. Or you can join as an unaffiliated individual, as many have. You don't need your government's permission to join. You just do it. Most join with a nominal membership fee ($300 for individuals) but membership grants are available in some cases, when the fee would be burden for active individual contributors.
What about participation in JTC1/SC34? First, you must be a member of your NB. How do you become a member of your NB? In the US the price is $1,200 and you must be representing a company or organization. Individuals? Sorry, you are not allowed to participate. In other countries the rules vary. In some cases membership is not available at all at any price. You are essentially wait-listed until an opening becomes available. (Sorry, we don't have enough seats, we heard in Portugal). In some countries, like China, membership is forbidden to native citizens who are employees of foreign subsidiaries in China. In other countries you can't join at all. It is entirely a government decision. So, good luck joining the NB of Syria, where the constitution has been suspended under emergency rule since 1963. (But somehow they managed to make time to vote on the OOXML ballot. Zimbabwe as well, that paragon of open participation.)
Now, it is entirely possible for a standards organization to appear open, but in practice to be inaccessible. So we must look at the complete cost of participation, not just the initial membership fees.
The OASIS ODF TC does its work entirely on an email list, a wiki, and via weekly phone calls, which are toll-free calls for most participants. I don't recall there ever being a face-to-face meeting, certainly not so long as I've been a member. This use of technology lowers the barrier to participation, so anyone can be effective on the TC if they wish. In particular it makes it easier for those who have day jobs and can only contribute to the mailing list during non-work hours.
What about JTC1/SC34? To participate effectively requires attendance at several international meetings each year (Plenary's, WG's, Ad-hocs, BRM's, etc.), as well as participation at NB meetings. Since many of the participants are representative of large corporations or government agencies, a junket mentality prevails and the meetings are often held in some of the most expensive places in the world: Geneva, Granada, London, Kyoto, Jeju Island, etc.
JTC1 does not allow meeting participation by telephone. Since important votes, are held at these meetings, and no provision is made for remote participation, one cannot effectively participate in JTC1/SC34 without a substantial budget for international travel. Attendance at a single meeting — the DIS 29500 BRM — was $3687.52 for me, and I flew coach and ate cheap. How many standards meetings like that can you as an individual or your small company afford per year?
Further, note the nature of your membership — what can you actually do? Can you vote? In OASIS, it is one person/one vote. In the TC, your vote as an individual with a $300 membership fee is counted exactly the same as my vote representing an OASIS Foundational Sponsor. At the organizational level, it is one company/one vote, and the smallest OASIS member organization has exactly the same vote as the largest.
In JTC1/SC34 however, you typically can't vote at all. NB's vote, not individuals, not companies. So your opinion and your wishes are subject to the will of your NB. If your opinion varies from your NB's, you may not be accredited to attend an international meeting, and even if you are able to attend you may not be allowed to speak your opinions. This extra level of indirection and censorship means that you, as an individual, can do little. And to the extent your NB's committee is stacked by a single vendor and their partner community, or your NB decides to overrule or ignore its technical committee, or Microsoft calls your head of state to change the NB's vote, or any of the dozens of other documented shenanigans that recently occurred, your entire membership fee and participation will be an entire waste of time, money and effort.
Membership is OASIS is far more open and inclusive. You join. You discuss. You vote. Period. In JTC1/SC34, you are mired in layers of bureaucracy at the national and international level, in a system crafted by and for the big boys to cut back room deals and manipulate the process to the benefit of large corporations.
(Now that isn't to say that there are not some individual consultants out there who thrive in the JTC1 environment by mastering its dark, dusty, demon-haunted hallways. Even the largest corporations occasionally have need of this expertise, as Rick and others are quite aware. If JTC1/SC34 were truly open and transparent, such skills would not be needed. You certainly don't see anyone selling their services to help companies navigate OASIS, do you?)
What about transparency? As Rick demonstrates, OASIS meeting minutes and agenda are all posted and public. So is our mailing list. So are all of our drafts. So is our member and public comments.
But in JTC1/SC34, most of the documents are private, only accessible to SC34 members by password. And then occasionally JTC1 will step in prevent SC34 from releasing their own work , suppressing documents even from their own SC members. There are no public comments to speak of, and member comments on draft standards are secret.
So when you are back from your "trip", Rick, please let us know again, who wins on openness, participation and transparency?
And for the record, a couple of outright deceptions in Rick's post:
- Rick says that there are 80 NB's, and thousands people participating in JTC1, but only 13 people participating on the ODF TC. This is a particularly inept comparison. Why is he comparing all of JTC1 to a single OASIS TC? If you look at OASIS overall, you will see that OASIS has more than 5,000 participants, representing over 600 organizations and individual members in 100 countries. The ODF TC itself has 53 members, including 7 members of JTC1/SC34.
- Rick picks a "random" ODF TC minutes post from a year ago to attempt to suggest domination by a single company. Not so random a choice, methinks. It was a rare joint meeting of the ODF TC and the Metadata subcommittee, which brought in a far greater number of Sun employees than typically participate in a call.
Tuesday, May 13, 2008
Spreadsheet file format performance
But first, a little details of my setup. All timings, done by stopwatch, were from Office 2003 and OpenOffice 2.4.0 running on Windows XP, with all current service packs and patches. The machine is a Lenova T60p, dual-core Intel 2.16 Ghz and 2 GB of RAM. I took all the standard precautions -- disk was defragmented, and test files were confirmed as defragmented using contig. No other applications were running and background tasks were all shut down.
For test files, I went back to an old favorite, George Ou's (at the time with ZDNet) monster 50MB XLS file from his series of tests back in 2005. This file, although very large, is very simple. There are no formulas, indeed no formatting or styles. It is just text and numbers, treating a spreadsheet like a giant data table. So tests of this file will emphasize the raw throughput of the applications. Real world spreadsheets will typically be worse than this due to additional overhead from process styles, formulas, etc.
A test of a single file is not really that interesting. We want to see trends, see patterns. So I made a set of variations on George's original file, converting it into ODF, XLS and OOXML formats, as well as making scaled down versions of it. In total I made 12 different sized subsets of the original file, ranging down to a 437KB version, and created each file in all three formats. I then tested how long it took to load each file in each of the applications. In the case of MS Office, I installed the current versions of the translators for those formats, the Compatibility Pack for OOXML, and the ODF Add-in for the ODF support.
I find it convenient to report numbers per 100,000 spreadsheet cells. You could equally well use the original XLS spreadsheet size, or the number of rows of data, or any other correlated variable as the ordinate, but values per 100K cells is simple for anyone to understand.
I'll spare you all the pretty picture. If you want to make some, here is the raw data (CSV format). But I will give some summary observations.
For document sizes, the results are as follows:
- Binary XLS format = 1,503 KB per 100K cells
- OOXML format = 491 KB per 100K cells
- ODF format = 117 KB per 100K cells
Any ideas?
For load time, the times for processing the binary XLS files were:
- Microsoft Office 2003 = 0.03 seconds per 100K cells
- OpenOffice 2.4.0 = 0.4 seconds per 100K cells
So what about the new XML formats? There has been recent talk about the "Angle Bracket Tax" for XML formats. How bad is it?
- Microsoft Office 2003 with OOXML = 1.5 seconds per 100K cells
- OpenOffice 2.4.0 with ODF = 2.7 seconds per 100K cells
OK. So what are we missing. Ah, yes, ODF format in MS Office, using their ODF Add-in.
- Microsoft Office 2003 with ODF, using the ODF Add-in = 74.6 seconds per 100K cells
- Microsoft Office 2003 in XLS format = 0.75 seconds
- OpenOffice 2.4.0 in XLS format = 3.03 seconds
- Microsoft Office 2003 in OOXML format = 8.28 seconds
- OpenOffice 2.4.0 in ODF format = 14.09 seconds
- Microsoft Office 2003 in ODF format = 515.60 seconds
(I was not able to test files larger than this using the ODF Add-in since they all crashed .)
(Update: Since it is the question everyone wants to know, the beta version of OpenOffice 3.0 opens the OOXML version of that file in 49.4 seconds and Sun's ODF Plugin for Microsoft Office loads this file in 30.03 seconds. )
This is one reason why I think file format translation is a poor engineering approach to interoperability. When OpenOffice wants to read an legacy XLS file, it does not approach the problem by translating the XLS into an ODF document and then loading the ODF file. Instead they simply load the XLS file, via a file filter, into the internal memory model of OpenOffice.
What is a file filter? It is like 1/2 of a translator. Instead of translating from one disk format to another disk format, it simply loads the disk format and maps it into an application-specific memory model that the application logic can operate directly on. This is far more efficient than translation. This is the untold truth that the layperson does not know. But this is how everyone does it. That is how we support formats in SmartSuite. That is how OpenOffice does it. And that is how MS Office does it for the file formats they care about. In fact, that is the way that Novell is now doing it now, since they discovered that the Microsoft approach is doomed to performance hell.
So it is with some amusement that I watch Microsoft and others propose translation as a solution to interoperability, creating reports about translation, even a proposal for a new work item in JTC1/SC34 concerning file format translation, when the single concrete attempt at translation is such an abysmal failure. It may look great on paper, but it is an engineering disaster. What customers need is direct, internal support for ODF in MS Office, via native code, in a file filter, not a translator that takes 10 minutes to load a file.
The astute engineer will agree with the above, but will also feel some discomfort at the numbers. There is more here than can be explained simply by the use of translators versus import filters. That choice might explain a 2x difference in performance. A particularly poor implementation might explain a 5x difference. But none of this explains why MS Office is almost 40x slower in processing ODF files. Being that much slower is hard to do accidentally. Other forces must be at play.
Any ideas?
Labels: ODF, OOXML, Performance
Wednesday, May 07, 2008
Achieving the impossible

Unadulterated copy of James Clark's Relax NG validator jing. Unadulterated copy of Kohsuke Kawaguchi's Sun Multi-Schema Validator msv. Unadulterated copy of the ODF 1.0 Relax NG schema. Unadulterated copy of the ODF 1.0 Standard, in ODF format.
No errors from either validator.
msv is so good as to tell us "the document is valid". But jing indicates success with only silence. So will I.
Labels: ODF
Monday, May 05, 2008
The Challenge
<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
office:version="1.0">
<office:body>
<office:text>
<text:p>Dear Alex Brown. Please prove that I am invalid ODF 1.0 (ISO 26300:2006). I do not think that I am. In fact I think that your statement that there are no valid ISO ODF documents in the world, and that there cannot be, is a brash, irresponsible and indefensible piece of bombast that you should retract.</text:p>
<text:p>(Please note that this document contains no ID, IDREF or IDREFS attributes. Nor does it contain custom content.)</text:p>
</office:text>
</office:body>
</office:document-content>
Friday, May 02, 2008
ODF Validation for Dummies
Alex Brown has a problem. He can't figure out how to validate ODF documents. Unfortunately, when he couldn't figure it out, he didn't ask the OASIS ODF TC for help, which would have been the normal thing to do. Indeed, the ODF TC passed a resolution back in February 2007 that said, in part:
That the ODF TC welcomes any questions from ISO/IEC JTC1/SC34 and
member NB's regarding OpenDocument Format, the functionality it
describes, the planned evolution of this standard, and its relationship
to other work on the technical agenda of JTC1/SC34. Questions and
comments can be directed to the TC chair and secretary whose email
addresses are given at
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
or through the comments facility at
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
So it is rather uncollegial of Alex to refuse such an open, transparent way of getting his questions answered. But Alex didn't avail himself of that avenue. He just assumed if he couldn't figure out how to validate ODF then it simply couldn't be done, and that ODF was to blame. This is presumptuous. Does he think that in the three years since ODF 1.0 became a standard, that no one has tried to validate a document?
Alex is so sure of himself that he publicly exults on the claimed significance of his findings:
I think you agree that these are bold pronouncements, especially coming from someone so prominent in SC34, the Convenor of the ill-fated OOXML BRM, someone who is currently arguing that SC34 should own the maintenance of OOXML and ODF, indeed someone who would be well served if he could show that all consortia standards are junk, and that only SC34 (and he himself) could make them good.
- For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF.
- Even if the schema is fixed, we can see that OpenOffice.org 2.4.0 does not produce valid XML documents. This is to be expected and is a mirror-case of what was found for MS Office 2007: while MS Office has not caught up with the ISO standard, OpenOffice has rather bypassed it (it aims at its consortium standard, just as MS Office does).
Of course, I've been known to pontificate as well. There is nothing necessarily wrong with that. The difference here is that Alex Brown is totally wrong.
But let's see if we can help show Alex, or anyone else similarly confused, the correct way to validate an ODF document.
First start with an ODF document. When Alex tested OOXML, he used the Ecma-376 OOXML specification. Let's do the analogous test and validate the ODF 1.0 text. You can download it from the OASIS ODF web site. You'll want this version of the text, ODF 1.0 (second edition), which is the source document for the ISO version of ODF.
You'll also want to download the Relax NG schema files for OASIS ODF 1.0, which you can download in two pieces: the main schema, and the manifest schema.
Next you'll need to get a Relax NG validator. Alex recommends James Clark's jing, so we'll use that. I downloaded jing-20030619.zip the main distribution for use with the Java Runtime Environment. Unzip that to a directory and we're almost there.
Since jing operates on XML files and knows nothing about the Zip package structure of an ODF file, you'll need to extract the XML contents of the ODF file. There are many ways to do this. My preference, on Windows, is to associate WinZip with the ODF file extensions (ODT, ODS and ODP) so I can right-click on these files unzip them. When you unzip you will have the following XML files, along with directories for images files and other non-XML resources you can ignore:
- content.xml
- styles.xml
- meta.xml
- settings.xml
- META-INF/manifest.xml
java -jar c:/jing/bin/jing.jar OpenDocument-schema-v1.0-os.rng content.xml
(Your command may vary, depending on where you put jing, the ODF schema files and the unzipped ODF files)
The result is a whole slew of error messages:
C:\temp\odf\OpenDocument-schema-v1.0-os.rng:17658:18: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"
C:\temp\odf\OpenDocument-schema-v1.0-os.rng:10294:22: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"
Oh no! Emergency, emergency, everyone to get from street!
I wonder if this is one of the things that tripped Alex up? Take a deep breath. These in fact are not Relax NG (ISO/IEC 19757-2) errors at all, but errors generated by jing's default validation of a different set of constraints, defined in the Relax NG DTD Compatibility specification which has the status of a Committee Specification in OASIS. It is not part of ISO/IEC 19757-2.
Relax NG DTD Compatibility provides three extensions to Relax NG: default attribute values, ID/IDREF constaints and a documentation element. The Relax NG DTD Compatibility specification is quite clear in section 2 that "Conformance is defined separately for each feature. A conformant implementation can support any combination of features." And in fact, ODF 1.0, in section 1.2 does just that: "The schema language used within this specification is Relax-NG (see [RNG]). The attribute default value feature specified in [RNG-Compat] is used to provide attribute default values".
It is best to simple disable the checking of Relax NG DTD Compatibility constraints by using the documented "-i" flag in jing. If you want to validate ID/IDREF cross-references, then you'll need to do that in application code, and not using jing in Relax NG DTD Compatibility mode. Note that jing was not complaining about any actual ID/IDREF problem in the ODF document.
So, false alarm. You can walk safely on the streets now.
(That said, if we can make some simple changes to the ODF schemas that will allow it to work better with the default settings of jing, or other popular tools, then I'm certainly in favor of that. Alex's proposed changes to the schema are reasonable and should be considered.)
So, let's repeat the validation with the -i flag:
java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng content.xml
Zero errors, zero warnings.
java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng styles.xml
Zero errors, zero warnings.
java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng meta.xml
Zero errors, zero warnings.
java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng settings.xml
Zero errors, zero warnings.
java -jar c:/jing/bin/jing.jar -i OpenDocument-manifest-schema-v1.0-os.rng META-INF/manifest.xml
Zero errors, zero warnings.
So, there you have it, an example that shows that there is at least one document in the universe that is valid to the ODF 1.0 schema, disproving Alex's statement that "there are no XML documents in existence which are valid to ISO ODF."
The directions are complete and should allow anyone to validate the ODF 1.0 specification, or any other ODF 1.0 document. Now that we have the basics down, let's work on some more advanced topics.
First, the reader should note that there are two versions of the ODF schema, the original 1.0 from 2005, and the updated 1.1 from 2007. (This is also a third version underway, ODF 1.2, but that needn't concern us here.)
An application, when it creates an ODF document, indicates which version of the ODF standard it is targeting. You can find this indication if you look at the office:version attribute on the root element of any ODF XML file. The only values I would expect to see in use today would be "1.0" and "1.1". Eventually we'll also see "1.2".
It is important to use the appropriate version of the ODF schema to validate a particular document. Our goal, as we evolve ODF, is that an application that knows only about ODF 1.0 should be able to adapt and "degrade gracefully" when given an ODF 1.1 document, by ignoring the features it does not understand. But an application written to understand ODF 1.1 should be able to fully understand ODF 1.0 documents without any additional accommodation.
Put differently, from the document perspective, a document that conforms to ODF 1.0 should also conform to ODF 1.1. But the reverse direction is not true.
To accomplish this, as we evolve ODF, within the 1.x family of revisions, we try to limit ourselves to changes that widen the schema constraints, by adding new optional elements, or new attribute values, or expanding the range of values permitted. Constraint changes that are logically narrowing, like removing elements, making optional elements mandatory, or reducing the range of allowed values, would break this kind of document compatibility.
Now of course, at some point we may want to make bolder changes to the schema, but this would be in a major release, like a 2.0 version. But within the ODF 1.x family we want this kind of compatibility.
The net of this is, an ODF 1.1 document should only be expected to be valid to the ODF 1.1 schema, but an ODF 1.0 document should be valid to the ODF 1.0 and the ODF 1.1 schemas.
That's enough theory! Let's take a look now at the test that Alex actually ran. It is a rather curious, strangely biased kind of test, but the bad thinking is interesting enough to devote some time to examine in some detail.
When he earlier tested OOXML, Alex used the OOXML standard itself, a text on which Microsoft engineers had lavished many person-years of attention for the past 18 months, and he validated it with the current version of the OOXML schema. That is pretty much the best case, testing a document that has never been out of Microsoft's sight for 18 months and testing it with the current version of the schema. I would expect that this document would have been a regular test case for Microsoft internally, and that its validity has been repeatedly and exhaustively tested over the past 18 months. I know that I personally tested it when Ecma-376 was first released, since it was the only significant OOXML document around. So, essentially Alex gave OOXML the softest of all soft pitches.
I think Microsoft's response, that the validity errors detected by Alex are due to changes made to the schema at the BRM, is a reasonable and accurate explanation. The real story on OOXML standardization is not how many changes were made that were incompatible with Office 2007, but how few. It appears that very few changes, perhaps only one, will be required to make Office 2007's output be valid OOXML.
So when testing ODF, what did Alex do? Did he use the ODF 1.0 specification as a test case, a document that the OASIS TC might have had the opportunity to give a similar level of attention to? No, he did not, although that would have validated perfectly, as I've demonstrated above. Instead, Alex uses the OOXML specification, a document which by his own testing is not valid OOXML, then converts it into the proprietary .DOC binary format, then translates that binary format into ODF and then tries to validate the results with the ODF 1.0 schema (i.e., the wrong version of the ODF schema since OpenOffice 2.4.0's output is clearly declared as ODF 1.1), and then applies a non-applicable, non-standard DTD Compatibility constraint test during the Relax NG validation.
Does anyone see something else wrong with this testing methodology?
Aside from the obvious bias of using an input document that Microsoft has spent 18 months perfecting, and using the wrong schemas and validator settings, there is another, more subtle problem.
Alex's test of OOXML and ODF are testing entirely different things. With OOXML, he took a version N (Ecma-376) OOXML document and tried to validate it with a version N+1 (ISO/IEC 29500) version of the OOXML schema.
But what he did with ODF was take a version N+1 (ODF 1.1) document and tried to validate it with an version N (ODF 1.0) of the ODF schema.
These are entirely different operations. One test is testing the backwards compatibility of the schema, the other is testing the backwards compatibility of document instances. It takes no genius to figure out that if ODF 1.1 adds new elements, then an ODF 1.1 document instance will not validate with the ODF 1.0 schema. We don't ordinarily expect backwardly compatible validity of document instances. Again, Alex's tests are biased in OOXML's favor, giving ODF a much more difficult, even impossible task, compared the the versions ran for OOXML.
If we want to compare apples to apples, it is quite easy to perform the equivalent test with ODF. I gave it a try, taking a version N document (the ODF 1.0 standard itself, per above) and validated it with the version N+1 schema (ODF 1.1 in this case). It worked perfectly. No warnings, no errors.
In any case, in his backwards test Alex reports 7,525 errors, "mostly of the same type (use of an undeclared soft-page-break element)" when validating the OOXML text with ODF 1.0 schema. Indeed, all but 39 of these errors are reports of soft-page-break.
Soft page breaks are a new feature introduced in ODF 1.1. It has two primary advantages for accessibility. First it allows easier collaboration between people using different technologies to read a document. Not all documents are deeply structured, with formal divisions like section 3.2.1, etc. Most business documents are loosely structured, and collaboration occurs by referring to "2nd paragraph on page 23" or "the bottom of page 18". But when using different assistive technologies, from larger fonts, to braille, to audio renderings, the page breaks (if the assistive technology even has the concept of a page break) are usually located differently from the page breaks in the original authoring tool. This makes collaboration difficult. So, ODF 1.1 added the ability for applications to write out "soft" page breaks, indicating where the page breaks occurred when the original source document was saved.
Although this feature was added for accessibility reasons, like curb cuts, its likely future applications are more general. We will all benefit. For example, a convertor for translating from ODF to HTML would ordinarily only be able to calculate the original page breaks by undertaking complex layout calculations. But with soft page breaks recorded, even a simple XSLT script can use this information to insert indications of page breaks, or to generate accurate page numbering, etc. Although the addition of this feature hinders Alex's idiosyncratic attempt to validate ODF 1.1 documents with the ODF 1.0 schema, I think the fact that this feature helps blind and visually impaired users, and generally improves collaboration makes it a fair trade-off.
Wouldn't you agree?
That leaves 39 validation errors in Alex's test. 12 of them are reports of invalid values in an xlink:href attribute value. This appears to be an error in the original DOCX file. Garbage In, Garbage Out. For example, in one case the original document has HYPERLINK field that contains a link to content in Microsoft's proprietary CHM format (Compiled HTML). The link provided in the original document does not match the syntax rules required for an XML Schema anyURI (the URL ends with "##" rather than "#") Maybe it is correct for markup like this, with non-standard, non-interoperable URI's, to give validation errors. This is not the first time that OOXML has been found polluting XML with proprietary extensions. But realize that OpenOffice 2.4.0 did not create this error. OpenOffice is just passing the error along, as Office 2007 saved it. It is interesting to note that this error was not caught in MS Office, and indeed is undetectable with OOXML's lax schema. But the error was caught with the ODF schema. This is a good thing, yes? It might be a good idea for OpenOffice to add an optional validation step after importing Microsoft Office documents, to filter out such data pollution.
For the remaining validation errors, they are 27 instances of style:with-tab. Honestly, I have no explanation for this. This attribute does not exist in ODF 1.0 or ODF 1.1. That it is written out appears to be a bug in OpenOffice. Maybe someone there can tell us why the story is on this? But I don't see this problem in all documents, or even most documents.
For fun I tried processing this OOXML document another way. Instead of the multi-hop OOXML-to-DOC-to-ODF conversion Alex did, why not go directly from OOXML to ODF in one step, using the convertor that Microsoft/CleverAge created? This should be much cleaner, since it doesn't have all the legacy code or messiness of the binary formats or legacy application code. It is just a mapping from one markup to another markup, written from scratch. Getting the output to be valid should be trivial.
So I download the "OpenXML/ODF Translator Command Line Tools" from SourceForge. According to their web page, this tool targets ODF 1.0, so we'll be validating against the ODF 1.0 schemas.
This tool is very easy to use once you have the .NET prerequisites installed. The command line was:
odfconvertor /I "Office Open XML Part 4 - Markup Language Reference.docx"
The convertor then chugs along for a long, long, long time. I mean a long time. The conversion from OOXML to ODF eventually finished, after 11 hours, 10 minutes and 41 seconds! And this was on a Thinkpad T60p with dual-core Intel 2.16Ghz processor and 2.0 GB of RAM.
I then rang jing, using the validation command lines from above. It reported 376 validation errors, which fell into several categories:
- text:s element not allowed in this context
- bad value for text:style:name
- bad value for text:outline-level
- bad value for svg:x
- bad value for svg:y
- element tetx:tracked-changes not allowed in this context
- "text not allowed here"
In the end we should put this in perspective. Can OpenOffice produce valid ODF documents? Yes, it can, and I have given an example. Can OpenOffice produce invalid documents? Yes, of course. For example when it writes out a .DOC binary file, it is not even well-formed XML. And we've seen one example, where via a conversion from OOXML, it wrote out an ODF 1.1 document that failed validation. But conformance for an application does not require that it is incapable of writing out an invalid document. Conformance requires that it is capable of writing out a valid document. And of course, success for an ODF implementation requires that its conformance to the standard is sufficient to deliver on the promises of the standard, for interoperability.
It is interesting to recall the study that Dagfinn Parnas did a few years ago. He analyzed 2.5 million web pages. He found that only 0.7% of them were valid markup. Depending on how you write the headlines, this is either an alarming statement on the low formal quality of web content, or a reassuring thought on the robustness of well-designed applications and systems. Certainly the web seems to have thrived in spite of the fact that almost every web page is in error according to the appropriate web standards. In fact I promise you that the page you are reading now is not valid, and neither is Alex Brown's, nor SC34's, nor JTC1's, nor Ecma's, nor ISO's, nor the IEC's.
So I suggest that ODF has a far better validation record than HTML and the web have, and that is an encouraging statement. In any case, Alex Brown's dire pronouncements on ODF validity have been weighed in the balance and found wanting.
4 May 2008
Alex has responded on his blog with "ODF validation for cognoscneti". He deals purely with the ID/IDREF/IDREFS questions in XML. He does not justify his biased and faulty testing methodology, not does he reiterate his bold claims that there are no valid ODF 1.0 documents in existence.
Since Alex's blog does not seem to be allowing me to comment, I'll put here what I would have put there. I'll be brief because I have other fish to fry today.
Alex, no one doubts that ID/IDREF/IDREFS constraints must be respected by valid ODF document instances. I never suggested otherwise. But what I do state is that this is not a concern of a Relax NG validator. You can read James Clark saying the same thing in his 2001 "Guidelines for using W3C XML Schema Datatypes with RELAX NG", which says in part:
The semantics defined by [W3C XML Schema Datatypes] for the ID, IDREF and IDREFS datatypes are purely lexical and do not include the cross-reference semantics of the corresponding [XML 1.0] datatypes. The cross-reference semantics of these datatypes in XML Schema comes from XML Schema Part 1. Furthermore, the [XML 1.0] cross-reference semantics of these datatypes do not fit into the RELAX NG model of what a datatype is. Therefore, RELAX NG validation will only validate the lexical aspects of these datatypes as defined in [W3C XML Schema Datatypes].
Validation of ID/IDREF/IDREFS cross-reference semantics is not the job of Relax NG, and you are incorrect to suggest otherwise. Your logic is also deficient when you take my statement of that fact and derive the false statement that I believe that ID/IDREF semantics do not apply to ODF. One does not follow from the other.
You know, as much as anyone, that conformance is a complex topic. One does not ordinarily expect, except in trivial XML formats, that the complete set of conformance constraints will be expressed in the schema. Typically a multi-layered approach is used, with some syntax and structural constraints expressed in XML Schema or Relax NG, some business constraints in Schematron, and maybe even some deeper semantic constraints that are expressed only in the text of the standard and can only be tested by application logic.
For example, a document that defines a cryptographic algorithm might need to store a prime number. The schema might define this as an integer. The fact that the schema does not state or guarantee that it is a prime number is not the fault of the schema. And the inability of a Relax NG validator to test primality is not a defect in Relax NG. The primality test would simply need to be carried out at another level, with application logic. But the requirement for primality in document instances can still be a conformance requirement and it is still testable, albeit with some computational effort, in application logic.
I believe that is the source of your confusion. The initial errors you saw when running jing with the Relax NG DTD Compatibility flag enabled were not errors in the ODF document instances. What you saw was jing reporting that it could not apply the Relax NG DTD Compatibility ID/IDREF/IDREFS constraint checks using the ODF 1.0 schema. That in no way means that the constraints defined in XML 1.0 are not required on ODF document instances. It simply indicates that you would need to verify these constraints using means other than Relax NG DTD Compatibility.
So I wonder, have you actually found ODF document instances, say written from OpenOffice 2.4.0, which have ID/IDREF/IDREFS usage which violates the constraints expressed in ODF 1.0?
Finally, in your professional judgment, do you maintain that this is a accurate statement: "For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF."
Labels: ODF
Wednesday, April 16, 2008
Suggesting ODF Enhancements
The ODF TC receives ideas for new features from many places. Many of the ideas come from our TC members themselves, where we have representation from most of the major ODF vendors, from open source projects, interest groups, as well as from individual contributors.
Other ideas come from other vendors or open source projects, from organizations that the TC has a liaison relationship with (like ISO/IEC JTC1/SC34), or individual members of the public.
Contributions from OASIS TC members are already covered by the OASIS IPR Policy. The TC member who contributes written proposals to the TC is obliged from the time of contribution. And other TC members are obliged if they have been TC members for at least 60 days and remain a member 7 days after approval of any Committee Draft. You can see the participation status of TC members here.
For everyone else, those who are not members of the ODF TC, the rules require that proposals, feedback, comments, ideas, etc., come through our comment mailing list. But before you can post to the comment list you must first accept the terms of the Feedback License.
Is this extra step annoying? Yes, it is. But this pain is what is necessary to keep our IP pedigree clean and protect the rights of everyone to implement and use ODF. It is part of the price we pay for open standards. Free does not mean free from vigilance.
One of my responsibilities on the ODF TC is to monitor and process the public comments we receive. Regretfully this is a duty which I've neglected for too long. So I spent some time this week getting caught up on the comments, entering them all into a tracking spreadsheet. We have a total of 180 public comments since ODF 1.0 was approved by OASIS, covering everything from new feature proposals to reports of typographical errors.
The largest single source of comments is from the Japanese JTC1/SC34 mirror committee, where they have been translating the ODF 1.0 standard into Japanese. As you know, you will get no closer reading of a text than when attempting translation, so we're glad to receive this scrutiny. I'll look forward to adding the Japanese translation of ODF along side the existing Russian and Chinese translations soon.
For comments that are in the nature of a defect report, i.e., reporting an editorial or technical error in the standard, we will include a fix in the ODF 1.0 errata document we are preparing. For comments that are in the nature of a new feature proposal, we will discuss on a TC call, and decide whether or not to include it in ODF 1.2.
A sample of some of the feature proposals from the comment list are:
- A request to support embedded fonts in ODF documents
- A request to support multiple versions of the same document in the same file
- A request to allow vertical text justification
- A proposal for enhanced string processing spreadsheet functions
- A proposal for spreadsheet values to allow units, which would help prevent calculation errors due to mixing units, i.e., adding mm to kg would be flagged as an error.
- A proposal for allowing spreadsheet named ranges to have namespaces, with each sheet in a workbook having its own namespace.
- A proposal to allow a document to have a "portable" flag to allow it to self-identify that it contains only portable ODF content with no proprietary extensions.
- Proposal for adding FFT support to spreadsheet
- Proposal for adding overline text attribute
Of course, general comments are always welcome on this blog.
Labels: ODF
Saturday, February 16, 2008
Fast Track versus PAS
So when I hear people lump Fast Track and PAS process in JTC1 together, I roll my eyes and think... If only they knew how different they really are.
Let's give it a try, starting with PAS.
PAS stands for "Publicly Available Specification" and the PAS process in JTC1 allows an existing standard from outside of JTC1 to be submitted, reviewed and approved in an accelerated review cycle. An organization that wishes to make a PAS submission (typically a standards consortium) must first seek recognition as a PAS Submitter. This requires that they submit to JTC1 for approval a list of standards they wish to submit, as well as documentation that explains their organizational qualifications. The long list of organizational acceptance criteria are outlined in JTC1 Directives, Annex M:
M7.3 Organisation Acceptance CriteriaOnce this documentation is provided, a three-month JTC1 ballot is held on the question of whether to approved the applicant as a Recognized PAS Submitter. If approved, this status last for 2 years, but may be renewed by reapplying with updated organizational documentation. Renewals must also be approved by a 3-month letter ballot.
M7.3.1 Co-operative Stance (M)
There should be evidence of a co-operative attitude toward open dialogue, and a stated objective of pursuing standardisation in the JTC 1 arena. The JTC 1 community will reciprocate in similar ways, and in addition, will recognise the organisation's contribution to international standards.
It is JTC 1's intention to avoid any divergence between the JTC 1 revision of a transposed PAS and a version published by the originator. Therefore, JTC 1 invites the submitter to work closely with JTC 1 in revising or amending a transposed PAS.
There should be acceptable proposals covering the following categories and topics.
M.7.3.1.1 Commitment to Working Agreement(s)M.7.3.1.2 Ongoing Maintenance
- What working agreements have been provided, how comprehensive are they?
- How manageable are the proposed working agreements (e.g. understandable, simple, direct, devoid of legalistic language except where necessary)?
- What is the attitude toward creating and using working agreements?
- What is the willingness and resource availability to conduct ongoing maintenance, interpretation, and 5 year revision cycles following JTC 1 approval (see also M6.1.5)?
- What level of willingness and resources are available to facilitate specification progression during the transposition process (e.g. technical clarification and normal document editing)?
M.7.3.1.3 Changes during transposition
- What are the expectations of the proposer toward technical and editorial changes to the specification during the transposition process?
- How flexible is the proposing organisation toward using only portions of the proposed specification or adding supplemental material to it?
M.7.3.1.4 Future Plans
- What are the intentions of the proposing organisation toward future additions, extensions, deletions or modifications to the specification? Under what conditions? When? Rationale?
- What willingness exists to work with JTC 1 on future versions in order to avoid divergence? Note that the answer to this question is particularly relevant in cases where doubts may exist about the openness of the submitter organisation.
- What is the scope of the organisation activities relative to specifications similar to but beyond that being proposed?
M7.3.2 Characteristics of the Organisation (M)
The PAS should have originated in a stable body that uses reasonable processes for achieving broad consensus among many parties. The PAS owner should demonstrate the openness and non-discrimination of the process which is used to establish consensus, and it should declare any ongoing commercial interest in the specification either as an organisation in its own right or by supporting organisations such as revenue from sales or royalties.
M.7.3.2.1 Process and Consensus:
- What processes and procedures are used to achieve consensus, by small groups and by the organisation in its entirety?
- How easy or difficult is it for interested parties, e.g. business entities, individuals, or government representatives to participate?
- What criteria are used to determine "voting" rights in the process of achieving consensus?
M.7.3.2.2 Credibility and Longevity:
- What is the extent of and support from (technical commitment) active members of the organisation? b) How well is the organisation recognised by the interested/affected industry?
- How long has the organisation been functional (beyond the initial establishment period) and what are the future expectations for continued existence?
- What sort of legal business entity is the organisation operating under?
M7.3.3 Intellectual Property Rights: (M)
The organisation is requested to make known its position on the items listed below. In particular, there shall be a written statement of willingness of the organisation and its members, if applicable, to comply with the ISO/IEC patent policy in reference to the PAS under consideration.
Note: Each JTC 1 National Body should investigate and report the legal implications of this section.
M.7.3.3.1 Patents:
- How willing are the organisation and its members to meet the ISO/IEC policy on these matters?
- What patent rights, covering any item of the proposal, is the PAS owner aware of?
M.7.3.3.2 Copyrights:M.7.3.3.3 Distribution Rights:
- What copyrights have been granted relevant to the subject specification(s)?
- What copyrights, including those on implementable code in the specification, is the PAS originator willing to grant?
- What conditions, if any, apply (e.g. copyright statements, electronic labels, logos)?
- What distribution rights exist and what are the terms of use?
- What degree of flexibility exists relative to modifying distribution rights; before the transposition process is complete, after transposition completion?
- Is dual/multiple publication and/or distribution envisaged, and if so, by whom?
M.7.3.3.4 Trademark Rights:
- What trademarks apply to the subject specification?
- What are the conditions for use and are they to be transferred to ISO/IEC in part or in their entirety?
M.7.3.3.5 Original Contributions:
- What original contributions (outside the above IPR categories) (e.g. documents, plans, research papers, tests, proposals) need consideration in terms of ownership and recognition?
- What financial considerations are there?
- What legal considerations are there?
Once an organization has Recognized PAS Submitter status, it may now propose a PAS submission. Such a submission must be within scope of the Submitter's original application, and must be accompanied by an Explanatory Report that speaks to JTC1's strategic interests in Interoperability, Cultural and Linguistic Adaptability, as well as the following document-related acceptance criteria:
M7.4 Document Related Criteria
M7.4.1 Quality
Within its scope the specification shall completely describe the functionality (in terms of interfaces, protocols, formats, etc) necessary for an implementation of the PAS. If it is based on a product, it shall include all the functionality necessary to achieve the stated level of compatibility or interoperability in a product independent manner.
M.7.4.1.1 Completeness (M):
- How well are all interfaces specified?
- How easily can implementation take place without need of additional descriptions?
- What proof exists for successful implementations (e.g. availability of test results for media standards)?
M.7.4.1.2 Clarity:
- What means are used to provide definitive descriptions beyond straight text?
- What tables, figures, and reference materials are used to remove ambiguity?
- What contextual material is provided to educate the reader?
M.7.4.1.3 Testability (M)
The extent, use and availability of conformance/interoperability tests or means of implementation verification (e.g. availability of reference material for magnetic media) shall be described, as well as the provisions the specification has for testability.
The specification shall have had sufficient review over an extended time period to characterise it as being stable.
M.7.4.1.4 Stability (M):
- How long has the specification existed, unchanged, since some form of verification (e.g. prototype testing, paper analysis, full interoperability tests) has been achieved?
- To what extent and for how long have products been implemented using the specification?
- What mechanisms are in place to track versions, fixes, and addenda?
M.7.4.1.5 Availability (M):
- Where is the specification available (e.g. one source, multinational locations, what types of distributors)?
- How long has the specification been available?
- Has the distribution been widespread or restricted? (describe situation)
- What are the costs associated with specification availability?
M7.4.2 Consensus (M)
The accompanying report shall describe the extent of (inter)national consensus that the document has already achieved.
M.7.4.2.1 Development Consensus:
- Describe the process by which the specification was developed.
- Describe the process by which the specification was approved.
- What "levels" of approval have been obtained?
M.7.4.2.2 Response to User Requirements:
- How and when were user requirements considered and utilised?
- To what extent have users demonstrated satisfaction?
M.7.4.2.3 Market Acceptance:
- How widespread is the market acceptance today? Anticipated?
- What evidence is there of market acceptance in the literature?
M.7.4.2.4 Credibility:
- What is the extent and use of conformance tests or means of implementation verification?
- What provisions does the specification have for testability?
M7.4.3 Alignment
The specification should be aligned with existing JTC 1 standards or ongoing work and thus complement existing standards, architectures and style guides. Any conflicts with existing standards, architectures and style guides should be made clear and justified.
M.7.4.3.1 Relationship to Existing Standards:
- What international standards are closely related to the specification and how?
- To what international standards is the proposed specification a natural extension?
- How is the specification related to emerging and ongoing JTC 1 projects?
M.7.4.3.2 Adaptability and Migration:
- What adaptations (migrations) of either the specification or international standards would improve the relationship between the specification and international standards?
- How much flexibility do the proponents of the specification have?
- What are the longer-range plans for new/evolving specifications?
M.7.4.3.3 Substitution and Replacement:
- What needs exist, if any, to replace an existing international standard? Rationale?
- What is the need and feasibility of using only a portion of the specification as an international standard?
- What portions, if any, of the specification do not belong in an international standard (e.g. too implementation specific)?
M.7.4.3.4 Document Format and Style
- What plans, if any, exist to conform to JTC 1 document styles?
The Explanatory Report also sets the maintenance regime for the submission, if approved
The proposed standard, along with the Explanatory Report is then distributed to JTC1 NB's for a 6-month ballot. Approval criteria is 2/3 approval of voting P-members, and no more than 25% disapproval in total. At the end of the ballot a Ballot Resolution Meeting may be held if needed.
So, that is PAS process, in brief. PAS process is how ODF was approved back in 2006, with OASIS as the Recognized PAS Submitter.
Fast Track process, is almost the same from the time the ballot is issued. The six-month period is split into a 30-day "contradiction period" and a 5-month ballot. (That is an odd difference, with no clear reason). But the voting criteria, the BRM process, etc., this is all the same between the two. What is different (and there are critical differences) is everything that happens before the ballot.
Who can submit a Fast Track? Any JTC1 P-member, or any Class A Liaison can propose a Fast Track.
We all know about P-members. They are NB's, typically the highest standardization committee in any country. A P-member used to also mean that you had a broad interest in many or most JTC1 matters. But now it may mean merely that Microsoft asked you to join as a P-member.
Class A Liaison are "Organisations which make an effective contribution to and participate actively in the work of JTC 1 or its SCs for most of the questions dealt with by the committee". Any organization can apply to be a Class A Liaison and be voted in via a letter ballot or at a meeting. There are no formal organization qualifications, no requirement to state an interest in eventually making Fast Tracks, or to answer any of the types of questions that PAS Submitters must answer.
Further, once approved as a Class A Liaison, the status lasts forever. There is no requirement to renew or reapply. In fact JTC1 Directives even lack a documented procedure for removing a Class A Liaison.
So what about the proposals for Fast Track submission. What is required of them? No Explanatory Report is required. No checklist of document-related criteria must be answered. JTC1 Directives say merely "The criteria for proposing an existing standard for the fast-track procedure is a matter for each proposer to decide." That's it. It is at the sole discretion of the Class A Liaison.
So you can see what great power Ecma has over JTC1 -- they can submit any standard they want for Fast Track, and no one in JTC1 can stop them, or even remove their right to submit more Fast Tracks.
This may explain why Ecma is able to command such high membership fees. A full voting membership in OASIS, which would allow a company to help produce an OASIS Standard for later submission to JTC1 under the arduous PAS process, this costs $1,100 for a small company. To join the US NB and be able to lobby for a Fast Track submission from the US, this will cost you $9,500. But to join Ecma as a voting member (what they call an "Ordinary Member") this will cost you 70,000 Swiss Francs, or $64,000. That is what no-questions-asked Fast Track service is worth. I think that, from Microsoft's perspective, the extra $62,900 is money well spent. But what about from JTC1's perspective? They don't get this extra money. So what's their excuse for having these permissive Fast Track procedures that give Ecma so much control?
In any case, that is why I roll my eyes when people lump PAS and Fast Track together, and say that they are essentially the same process. They clearly aren't. PAS Submitters like OASIS are given intense scrutiny, and are required to document in great detail how their organization and their proposals meet JTC1 criteria. The scrutiny never ends, as a new Explanatory Report is required for every submission, and their status as Recognized PAS Submitter only lasts for a few years before requiring re-approval.
Fast Track submitters, as Class A Liaisons, on the other hand, are the monarchs of JTC1. They serve for life and are answerable to no one. They can submit a Fast Track on any subject they want, at any time. So a standards consortium like Ecma, with primary expertise in optical disk standards, but never having produced an XML standard before, can rubber stamp the world's largest XML standard and submit it for Fast Track processing to JTC1. And no one can do a thing about it.
Tuesday, February 12, 2008
Punct Contrapunct
We’ve made the overview available for free (I must admit I'm not sure for how long), as we believe this topic warrants expanded industry debate before a February, 2008 ISO ballot on OOXML, and we want to help catalyze and advance the debate.
The degree of expanded debate achieved may be estimated by noting that Microsoft is sending this report to every JTC1 national body involved in the OOXML ballot, from Pakistan to Ecuador, and has invited Peter O'Kelly to speak on this paper both at the recent OOXML press event in Washington as well as this week's Office Developers Conference.
Much could be said of this report, but I'll limit myself to commenting on a single passage:
[S]everal vendors interviewed for this overview indicated that it's essentially impossible to get ODF proposals approved if they're not also supported in OpenOffice.org, and further noted that Sun closely controls OpenOffice.org (much as it also holds control over Java).
It should be noted that, before making this statement, the authors neither contacted OASIS nor the OASIS ODF TC in order to check their facts.
The ODF Alliance published a rebuttal of this report, and in particular took umbrage at that passage, saying:
This is demonstrably false, and the use of unnamed “vendors” as sources does not eliminate the need for doing basic fact checking on such claims. Rumors and innuendo do not objective analysis make.
First, on the control aspect, note that ODF 1.0, the standard, is owned and controlled by OASIS, a standards consortium of over 600 member organizations. Sun is just one company among many members. Indeed, for most of the development of ODF, Microsoft was on the Board of Directors of OASIS.
Second, OASIS is a corporation. It is legally bound to its Bylaws. There is no arbitrary control by member corporations.
The ODF TC is co-chaired by an IBM employee and a Sun employee, and is regulated by the OASIS TC Process document, which is publicly readable by all and has clear rules of procedure and appeal.
The ODF TC has three subcommittees. The Accessibility SC is co-chaired by IBM and Sun, while the Formula Subcommittee and the Metadata Subcommittee are each chaired by individual members of OASIS who are not affiliated with any large corporations.
Voting rights in the ODF TC, for accepting or rejecting features, is currently as follows:
- Sun – 3 voting members
- IBM – 4 voting members
- Individuals – 3 voting members
This can easily be verified at the OASIS ODF TC website.
Is sharing the chair position on the TC and on 1 of 3 subcommittees considered “closely controlling”? Is having 30% of the votes considered “closely controlling”?
As for proposals being accepted into ODF, we note that all three major features for ODF 1.2, RDF metadata, OpenFormula, and enhanced accessibility, are new proposals which have not been yet implemented in OpenOffice. Moreover, the ODF TC is currently processing a set of features requested by the KOffice open source project. So the assertion that it is “essentially impossible” to get new features into ODF if they are not already supported by OpenOffice is not true. This error is unfortunate and needs correcting through rigorous fact checking, as do the others, in our opinion.
Oddly enough, this particular error occurs in several places. A search of the report for the word “control” shows it used six times, once in reference to “Chinese communists” and five times in reference to Sun Microsystems. Note, however, that no mention is ever made of the strong direct control Microsoft asserts over OOXML, its having sole chairmanship of the Ecma TC45, and its having secured a committee charter that prevents any changes to OOXML that are not compatible with Microsoft Office.
Again, we're puzzled by the inaccuracy on one hand and the lack of balance on the other.
Now, back to the Burton Group, where Guy Creese responds on the Burton Group blog:
We were not expecting to be told that Sun had significant sway over the standard, but several people told us that (spread across more than one ODF-oriented vendor), which is why we noted it in the report. As the ODF Alliance notes, IBM and Sun—two of Microsoft’s most powerful productivity application archrivals today (as well as partners to Microsoft in myriad other domains, e.g., Web services-related standards initiatives)—collectively control 70% of the votes in the ODF TC which determines if proposals will be accepted or rejected. This suggests there is ample opportunity for conflicts of interest.
Guy, excuse me, did you say "conflicts of interest"? Please explain. Or maybe when Peter O'Kelly comes back from speaking at Microsoft's Office Developers Conference he can explain it for us?
In any case, the factual errors in your report with respect to the control of ODF have been clearly demonstrated, but instead of simply admitting and correcting the error, you hide beyond anonymous sources and further impugn OASIS by charging some sort of "conflict of interest".
To follow your logic further demonstrates the absurdity of it. If you believe that the fact that IBM and Sun "collectively control 70% of the votes in the ODF TC" lends weight to your argument, then what is shown by the equally true mathematical fact that IBM plus independent members also control 70% of the votes? Why is this equally true fact not mentioned? This is the nature of plurality, that there are many different combinations of votes that could make a majority position. Further, note that these groups in practice do not always vote as a bloc. We've had votes where the independent members split their vote, and we even had a vote where the IBM members did not all vote alike. So much for your simplistic control theory.
I will not question whether your anonymous sources indeed misled you. For sake of argument, I will accept unquestioningly that you indeed had sources and that they said exactly what you claim they said. However, having sources does not excuse you, as an analyst, from doing basic fact checking. The rules of OASIS and the voting composition of the ODF TC are facts, not opinions, and the correct information was sitting there, on public web sites, for you to check. It is not your fault that you were misled by sources, but it is your fault that you did not verify their claims. To publish controversial statements based on anonymous sources without fact checking, this is not something that represents the Burton Group's finest work.
The Burton Group has denigrated the work and the members of the OASIS Open Document Format Technical Committee (of which I am Co-Chair) with published statements that have been shown to be false. The Burton Group owes us an apology and an immediate retraction.
Waiting until after February, after the DIS 29500 process concludes, to make corrections is unacceptable. Since your stated purpose in making this report public was to "advance the debate" in the current OOXML ISO process, withholding factual corrections until after that process concludes would imply that you and the Burton Group see no problems with knowingly persisting in influencing an ISO ballot with false information published under the Burton Group name. I don't believe that is the image that the Burton Group would want to project. So I urge that a correction is in order now.
Thursday, January 31, 2008
The Case for Harmonization
First note that many JTC1 NB's raised the issue of harmonization in their DIS 29500 ballot comments last September. Some merely requested harmonization, such as Korea, South Africa, Belgium, Peru, Switzerland, or the Czech Republic, while others in addition outlined ways to achieve harmonization. For example, AFNOR, the French NB stated:
After 5 months of extensive discussions between stakeholders in the field of revisable document formats, AFNOR, in the aim to obtain a single standard for XML office document formats within 3 years, makes the following proposal:(Note that a Technical Specification, in ISO process, is for proposals which lack insufficient support for approval as an International Standard, but for which publication is still desired. This may be appropriate for OOXML.)
- Split the current ECMA 376 standard in 2 parts in order to differentiate the essential OOXML core functions necessary for easy implementation from those functionalities that are needed for the exchange of legacy office file formats;
- Incorporate the technical comments below and those in the attached comment table submitted to the Fast Track;
- Attribute the status of Technical Specification to both parts;
- Establish a process of convergence between ODF (already standardized as ISO/IEC 26300) and the above mentioned OOXML core. ISO/IEC shall invite parties involved to commit themselves to initiate simultaneously the revisions of the existing ODF v1.0 and the OOXML core in order to obtain at the end of the revision process a standard as universal as possible.
New Zealand's proposal was similar:
- OOXML should be considered by JTC 1 for publication as a Type 2 Technical Report.
- Seek to harmonize with the existing ODF standard to reduce the cost of interoperability, cost of having two standards, and cost of support/maintenance .
- to have more than 63 columns in a table
- to have background images in tables
- to have font weights beyond “normal” and “bold”.
Ecma rejected every single one of these requests. They did not argue that the requested features were unreasonable. They did not argue that the requested feature was not needed. Their argument was that harmonization of the formats was not necessary because there exist tools that will translate between OOXML and ODF. In other words, they rejected these requests merely because they were pro-harmonization, regardless of the underlying merit or need of the feature. Ironically, Microsoft's conversion tools are restricted in their fidelity because of the lack of these very features.
On the question of harmonization, we are either moving toward it, or we are moving away. There is no time better than the present to harmonize. Waiting will only make matters worse, as we will then need to consider legacy OOXML documents as well as legacy binary and legacy ODF documents. The Ecma response does not move us toward harmonization, but starts down the road toward further divergence, a long and costly divergence.
Tim Bray made the critical observation back in 2005, “The world does not need two ways to say 'This paragraph is in 12-point Arial with 1.2em leading and ragged-right justification'.”
Microsoft likes to claim that harmonization is impossible, that slapping together the features of both standards would lead to a messy, impenetrable mess. Of course, but only an idiot would suggest that as an approach to harmonization. So why do they always bring that up as their strawman?
A look at OpenOffice and Microsoft Office shows a huge degree of functional overlap. Harmonization starts from looking at this functional overlap – and there is a significant, perhaps 90%+ area where they do overlap – and expresses the functional overlap identically, using the same xml schema. In other words, harmonization identifies the commonalities at the functional level and finds a common representation for that commonality.
It would also be expected that the common functionality between ODF and OOXML would also include a common extensibility mechanism, a way for a vendor to express application-specific features that are outside of the harmonized standard.
The remaining 10% of the functionality would be the focus of the harmonization work, the area that requires the most attention. Some portion of that 10% will represent general-purpose features that we can imagine multiple application supporting. We take those features and add them to ODF. That remaining portion of the 10%, which only serves one vendor's needs, such as flags for deprecated legacy formatting options, could be represented using the common extensibility mechanism.
Does this sound impossible? That's not what Microsoft says. Gray Knowlton, Group Product Manager for Microsoft Office, was candid to PC World a couple of weeks ago:
Also, if individual governments mandate the use of ODF instead of Open XML, Microsoft would adapt, Knowlton said. The company would then implement the missing functionality that ODF doesn't support. However, those extensions would be custom-designed and outside of the standard, which is counter to the idea of an open document standard, Knowlton said.
So we've agreed that this approach is technically feasible. We're also agreed that extending ODF outside of the standards process is not a good idea. So the obvious solution is to extend ODF within the standards process. So, let's do it! What are we waiting for?
There is no reason why, by a harmonization process, all of the functionality of Microsoft Office cannot be represented on a base of ISO 26300 OpenDocument Format. I personally, as Co-Chair of the OASIS ODF TC, stand ready and willing to sponsor such a harmonization effort in OASIS. So let's start harmonization now, and avoid further divergence.
My read of NB comments indicates that there is a sizable bloc, perhaps even a decisive bloc, of NB's who are in favor of harmonization. Lets push on this and articulate a roadmap along the lines of the proposals by France and New Zealand, that accomplishes this.
Wednesday, November 21, 2007
PDF, The Waste Land, and Monica's Blue Dress
In a more recent post, Archiving Documents, James wonders aloud why anyone would use ODF or OOXML for archiving, compared to PDF or PDF/A, saying "After all, archiving means preserving things, and usually you want to preserver the total look of a document. PDF/A does that."
I recommend reading the Archiving Documents post in full, and then return here for an alternate point of view.
.
.
.
We say the word "archive" quite easily and cover a large number of activities by that name, and in doing so risk blurring a number of different activities into one over-generalization. Before you are told that format X or format Y is best for archiving it is fair to ask what I mean by "archiving" and ask who does the archiving, for what purpose and under what constraints.
In some cases what must be preserved, and for how long, is spelled out in detail for you, by statute, regulation or court order. Or, a company, in anticipation of such requests may require preservation as part of a corporate-wide records retention policy for certain categories of employees or certain categories of documents.
An example of the range of materials that may be included can be seen this this preservation order:
"Documents, data, and tangible things" is to be interpreted broadly to include writings; records; files; correspondence; reports; memoranda; calendars; diaries; minutes; electronic messages; voicemail; E-mail; telephone message records or logs; computer and network activity logs; hard drives; backup data; removable computer storage media such as tapes, disks, and cards; printouts; document image files; Web pages; databases; spreadsheets; software; books; ledgers; journals; orders; invoices; bills; vouchers; checks; statements; worksheets; summaries; compilations; computations; charts; diagrams; graphic presentations; drawings; films; charts; digital or chemical process photographs; video; phonographic tape; or digital recordings or transcripts thereof; drafts; jottings; and notes. Information that serves to identify, locate, or link such material, such as file inventories, file folders, indices, and metadata, is also included in this definition.
--Pueblo of Laguna v. U.S. // 60 Fed. Cl. 133 (Fed. Cir. 2004).
I would pay particular attention to the part at the end, "...drafts; jottings; and notes. Information that serves to identify, locate, or link such material, such as file inventories, file folders, indices, and metadata".
Similarly, consider government and academic archives, that are preserving documents for the long term. The archivist tries to anticipate what questions future researchers will have, and then tries to preserve the document in such a way that it can best answer those questions.
A PDF version of a document answers a single question, and answers it quite well: "What did this document look like when printed?" But this is not the only question that one might have of a document. Some other questions that may be asked include:
- What was the nature of collaboration that lead to this document? How many people worked on it? Who contributed what?
- How did the document evolve from revision to revision?
- In the case of a spreadsheet, what was the underlying model and assumptions? In other words, what are the formulas behind the cells?
- In the case of a presentation, how did the document interact with embedded media such as audio, animation, video?
- How was technology used to create this document? In what way did the technology help or impede the author's expression? (Note that researchers in the future may be as interested in the technology behind the document as the contents of the document itself.)
Let's take a analogous case. T.S. Eliot's 1922 poem The Waste Land is a landmark of 20th century literature. Not only is it important from an artistic and critical perspective, but it is also important from a technology perspective -- it is perhaps the first major poem to have been composed at the typewriter. What was published was, like a PDF, what the author intended, what he wanted the world to see. That is all the world knew until around 1970, after the poet's death, when the rest of the story emerged in the form of typewritten draft versions of the poem, with handwritten comments by Ezra Pound.

This provided pages and pages of marked up text that showed the nature and degree of the collaboration between Eliot and Pound far more than had been previously known. This is what researchers want to read. The final publication is great, but the working copy tells us so much more about the process. History is so much more than asking "What?". It continues by asking "How?" and eventually asking "Why?" -- this is where the real insight occurs, going beyond the mere collection of facts and moving on to interpretation. PDF answers the "What?" question admirably. I'm glad we have PDF as a tool for this purpose. But we need to make sure that when archiving documents we allow future research to ask and receive answers to the other questions as well.
Flash forward to the technology of today. We're not all writing great poetry, but we are collaborating on authoring and reviewing and commenting on documents. But instead of doing it via handwritten notes, we're doing it via review & comment features of our word processors. Although the final resulting document may be easily exportable as a PDF document, that is really just a snapshot of what the document looks like today. It loses the record of the collaboration. I don't think that is what we want to archive, or at least not exclusively. If you archive PDF, then you've lost the collaborative record.
Another example, take a spreadsheet. You have cells with formulas and these formulas calculate results which are then displayed. When you make a PDF version of the spreadsheet you have a record of what it "looked like", but this isn't the same as "what it is". You cannot look at the formulas in the PDF. They don't exist. Future researchers may want to check your spreadsheeet's assumptions, the underlying model. There may also be the question of whether your spreadsheet had errors, whether from a mis-copied formula, or from an underlying bug in the application. If you archive exclusively as PDF, no one will ever be able to answer these questions.
One more example, going back to 1998 and the Clinton/Lewinsky scandal. Kenneth Starr's report on the case was written in WordPerfect format, distributed to the House, which converted it to HTML form and released it on the web. But due to a glitch in the HTML translation process, footnotes that had been marked as deleted in the WordPerfect file reappeared in the HTML version. So we ended up with an official published Starr Report, as well as an unofficial HTML version which had additional footnotes.
Imagine you are an archivist responsible for the Starr Report. What do you do? Which version(s) do you preserve? Is your job to record the official version, as-published? Or is your job to preserve the record for future researchers? Depending on your job description, this might have a clear-cut answer. But if I were a future historian, I would sure hope that someone someplace had the foresight to archive the original WordPerfect version. It answers more questions than the published version does.
So, to sum it up: What you archive determines what questions you can later ask of a document. If you archive as PDF, you have a high-fidelity version of what the final document looked like. This can answer many, but not all, questions. But for the fullest flexibility in what information you can later extract from the document, you really have no choice but to archive the document in its original authoring format.
An intriguing idea is whether we can have it both ways. Suppose you are in an ODF editor and you have a "Save for archiving..." option that would save your ODF document as normal, but also generate a PDF version of it and store it in the zip archive along with ODF's XML streams. Then digitally sign the archive along with a time stamp to make it tamper-proof. You would need to define some additional access conventions, but you could end up with a single document that could be loaded in an ODF editor (in read-only mode) to allow examination of the details of spreadsheet formulas, etc., as well as loaded in a PDF reader to show exactly how it was formated.
Sunday, November 18, 2007
Document Format FUD: A Guide for the Perplexed
This inaugural edition is dedicated to the fallout from the recent supernova we know as the OpenDocument Foundation, that in one final act of self-immolation swelled from obscurity to overwhelming brilliance, but then slowly faded away, ever fainter and more erratic, little more than hot gas, the dimming embers no longer sustainable.
Q: Now that the originator and primary supporter of OpenDocument Format has ended its support for ODF, does this mean the end for the ODF standard? (18 Nov 2007)
A: This question is based on a mistaken premise, namely that the OpenDocument Foundation was the originator or steward of the ODF standard. This is an erroneous notion.
The ODF standard is owned by the OASIS standards consortium, with over 600 member organizations and individual members. The committee in OASIS that that does the technical working of maintaining the ODF standard is called the OpenDocument TC. It has 15 organization members as well as 7 individual members. Until recently the OpenDocument Foundation was a member of the ODF TC, one voice among many.
The adoption of the ODF standard is promoted by several organizations, most prominently the ODF Alliance (with over 400 organizational members in 52 countries), the OpenDocument Fellowship (around 100 individual members) and the OpenDoc Society (a new group with a Northern European focus, with around 50 organizational members). To put this in perspective, the OpenDocument Foundation, before it changed its mission and dissolved, had only 3 members.
When you consider the range of ODF adoption, especially in Europe and Asia, the strong continuing work on ODF 1.2 in OASIS, and the strong corporate, government and organizational participation demonstrated in the global ODF User Workshop recently held in Berlin, we seem to be making a disproportionate amount of noise over the hysterics of the disintegrating 3-person OpenDocument Foundation.
A number of analysts/journalists/bloggers didn't check their facts and seem to have fallen into the trap, and ascribed a far greater importance to the actions of the Foundation. Curiously, these articles all quoted the same Microsoft Director of Corporate Standards. I hope this correlation does not prove to be a persistent contrary indicator for accuracy in future file format stories.
Luckily for us, David Berlind over at ZDNet has penetrated the confusion and gets it right:
...the future of the OpenDocument Foundation has nothing to do with the future of the OpenDocument Format. In other words, any indication by anybody that the OpenDocument Format has been vacated by its supporters is pure FUD.
11/27/2009 Update: Berlind did further research and interviews on this topic and followed up with a podcast and new blog post OpenDocument Format Community steadfast despite theatrics of now impotent ‘Foundation’ on this subject.
Q: The Open Document Foundation has a document, a "Universal Interoperability Framework" that on its title page says "Submitted to the OASIS Office Technical Committee by The OpenDocument Foundation October 16, 2007". What is the status of this proposal in the ODF TC? (18 Nov 2007)
A: No such document has been submitted to the OASIS TC, on this date or any other date. OASIS policy states that "Contributions, as defined in the OASIS IPR Policy, shall be made by sending to the TC's general email list either the contribution, or a notice that the contribution has been delivered to the TC’s document repository". A look at the ODF TC's list archive for October shows that there was no such contribution.
Q: The Foundation claims that the W3C's CDF format has better interoperability with MS Office than ODF has. Is this true? (18 Nov 2007)
A: The Foundation's claims have not been demonstrated, or even competently argued at a technical level that would allow expert evaluation. I cannot fully critique what is essentially vaporware. However, those who know CDF better than I do have commented on the mismatch between CDF and office documents, for example the recent interview with the W3C's Chris Lilley in Andy Updegrove's blog.
Q: So, does IBM then oppose CDF in favor of ODF? (18 Nov 2007)
A: No. IBM supports both the development of ODF and CDF and has a leadership role in both working groups. These are two good standards for two different things.
The W3C, over the years has produced a number of reusable, modular core standards for things like vector graphics (SVG), mathematical notation (MathML), forms (XForms), etc. To use a cooking analogy, these are like ingredients that can be combined to make a dish. ODF has taken a number of W3C standards and combined them to make a format for expressing conventional office documents, the familiar word processor, spreadsheet and presentation documents. ODF is an OASIS and ISO standard.
But just as eggs, butter and flour form the base of many recipes, the core W3C standards can be assembled in different ways for different purposes. This is a good thing.
CDF is not so much a final dish, but an intermediate step, like a roux (flour + butter) is when making a sauce. You don't use a roux directly, but build upon it, e.g., add mik to make a béchamel, add cheese for a cheese sauce, etc., CDF itself s not directly consumable. You need to add a WICD profile, something like WICD Mobile 1.0, before you have something a user agent can process.
Friday, October 12, 2007
ODF enters the Semantic Web
Metadata is not new. It has been around for centuries. In some cases metadata applies to the overall document, while in other cases it applies to only a portion of the content. Examples of the first case include titles of books, footnotes, ISBN numbers, LOC or Dewey Decimal categorizations, keywords, etc. The various forms of scribal marginalia, whether scholia or glosses in the margins of a manuscript, or personal annotations of the owner of a document, are historic examples of the second kind of metadata.
Marginal notes are frequently used today in business forms. A printed form represents, often imperfectly, a snapshot in time of an organization's view of their own process. But maybe the process was was approximated or the form was imperfectly designed, maybe it quickly became outdated, but somehow reality seems to outgrow the strictures of a form's blanks and checkboxes. So what do, as a user, do? You write notes in the margins or other places between form fields and hope that there is a human in the loop someplace to read your words.
In any case, of all documents, forms (originally called "formulary documents") have the most structured representation of data. Enter your social security number into the nine little boxes provided. Enter your date of birth here, Month first, then day, then two-digit year. Last name first, first name last. Everything is nice and simple, and provided your reality matches that which the form designer envisioned, your data will be easy to consume, whether by another person or, after data entry, by various online processes. Or maybe the form was entered online originally? Even better.
But what about all the other documents in the world, the ones that are not formally structured as forms? What sense can we make of them? Can you tell a social security number in a free-form document, or a date, or a zip code? Perhaps with pattern matching, you can find out some simple things. That is the essence of Microsoft's Smart Tags. (And we had much of this in Lotus Agenda a decade earlier.) But this only works for the most trivial cases. It only takes you so far.
What if I wanted to markup an academic paper, a work-in-progress, to indicate which quotations have been verified and which ones remain to be be verified? Or what if I want to annotate statements in recorded testimony according to which statements contradict and which corroborate another witness's statements? This goes far beyond pattern matching. I need a way to encode my knowledge, my view of the subject, in the document.
We have data in a document -- "Words,words, words" as Hamlet tells Polonius. But for those who work with thoughts, the present constraints of encoding our knowledge as rudimentary linear strings of characters is severe. In general text is multi-layered and hyper-linked in strange and marvelous ways. Your father's word processor and word processor format are inadequate to the task. The concept of a document as being a single store of data that lives in a single place, entire, self-contained and complete is nearing an end. A document is a stream, a thread in space and time, connected to other documents, containing other documents, contained in other documents, in multiple layers of meaning and in multiple dimensions. What we call a traditional document is really just a snapshot in time and space, a projection into print-ready output form of what documents will soon become.
The applications of metadata to business documents are legion. Wherever you have data, you also have the questions of:
- Who entered the data?
- Where did the data come from?
- Who verified the data?
- Who approved the data? Legal? HR? Business?
- Where is this data destined?
- How old is the data? When does it expire?
- How trustworthy is this data?
- Who must we cite as an authority for this data?
- Who owns this data?
- Who has permissions to see this data?
- Who can set policy for this data?
- Who else can edit this data?
- How does this data connect with my business? Is it a part number? The name of a customer or the name of an employee?
OpenDocument Format (ODF) 1.2 will be taking a step into the word of structured metadata with an RDF/XML metadata framework. If that sounds Greek to you, then let's say that a metadata framework enables application developers to create applications that do the above things. A framework doesn't tell you how you must say "This image is provided under a Creative Commons Share-Alike license" but provides a framework for application developers to express concepts like "licensed-under" and "Create Commons Share-Alike", as well a formal structure for expressing subject-predicate-object relationships, where the subject can be any of around 50 ODF document elements, such as paragraphs, footnotes, images, tables, etc.
To read more, here are some places to start:
For general background on the "semantic web", a good intro is 2001 Scientific American article "The Semantic Web" by Tim Berners-Lee, et. al.
For a bit more on RDF, the wikipedia page is pretty good.
Svante Schubert at Sun, also on the ODF Metadata Subcommittee has a recent blog post worth reading: "New Extensible Metadata Support With ODF 1.2.
Bruce D'Arcus, of the Metadata Subcommittee and co-lead of the OpenOffice.org Bibliographic Project also contributes his thoughts on the new ODF 1.2 metadata.
If you want to delve into the particulars of ODF 1.2's new metadata support, you can read the latest draft of the proposed changes to the specification [ODF] and the examples [ODF] document. Of course, any feedback on ODF drafts and published standards are welcome on the ODF TC's comment mailing list.
For a gentle introduction to metadata, ODF, where we are coming from and where we are going, I offer this interview [MP3] with Patrick Durusau, Chair of the ODF Metadata Subcommittee, which I recorded back in July.
Sunday, October 07, 2007
Cracks in the Foundation
However, in recent months the OpenDocument Foundation has found itself more and more isolated, outside of the mainstream debate. How far they have fallen can be seen in the fact that Microsoft has gone from ridiculing their conspiracy theories to using them to support their arguments. At the same time the Foundation's membership has dwindled to the point where only a small number remain. Former members have disassociated themselves from the Foundation as it turned increasingly to strident rhetoric. Whereas in the early days, the Foundation had a large membership that participated fully in the OASIS TC's, now their "contributions" are mainly that of heckling and haranguing the other members. Finally, the Foundation has recently announced its intent to abandon constructive work within OASIS, to actively lobby against adoption of ODF 1.2 in ISO and to push for an alternative format, CDF, based on XHTML, CSS 3.0 and RDF. This is an odd stance for a non-profit whose charter was:
The OpenDocument Foundation, Inc. is a 501c(3) non profit chartered to work in the public interest to support, promote and develop the OASIS OpenDocument File Format affectionately known as "ODf".
So it is against this backdrop that I read with interest in Linux Today the latest correspondence from the Foundation. You can read it yourself, or take the following 8 points from me as a condensed summary of their main points:
"The commercialization of interoperability remains a key driver in both big vendor deals and big vendor consortia FOSS is left on the outside looking in."
The conversion to XML [document formats] must be nondisruptive" meaning it fits into existing business processes which are increasingly dominated by Microsoft middleware. This implies a requirement for high-fidelity, loss-less round-trip conversions.
The alternative is "rip and replace" and that is too costly and disruptive.
Microsoft is moving toward a "grand convergence" of their services, desktop, device and servers, with OOXML at the core. "MS-OOXML is the primary transport, the document/data container of interop-integration preference."
ODF was not designed as a response to these problems.
Microsoft/Sun/Novell are working "to limit ODF interoperability and usefulness" because of some patent deals. (Sorry I can't summarize this one better -- I just don't understand it.)
IBM/Oracle/Google are working to "limit ODF interop" because "they want a total ripout and replace of MS Office."
The Open Document Foundation is in "the middle area of trying to perfect the conversion to XML".
Let me take these points one-by-one:
The OpenDocument Foundation seems to try to clothe themselves in the mantle of the open source community and pontificate on how the big bad vendors treat interoperability. But are they speaking as a non-profit or as a vendor? Take their DaVinci plugin, for example. Where is the source code? Why isn't this open source? Are we to follow the Foundation's claim of 100% interoperability, based on blind faith, without seeing some proof in the form of working code? I've been working on document conversions and document file formats of one kind or another for almost 20 years. I've never seen 100% fidelity conversions of anything but trivial formats. Extraordinary claims require extraordinary evidence. But we have nothing here, just white papers.
I would not claim a priori that all customers require lossless, 100% fidelity conversions. Remember, we do not see 100% fidelity even when upgrading from Office 2003 to Office 2007, but this appears to be adequate. What is required is that the total return from changing document formats exceeds any other profitable use of capital available to the enterprise. In other words, to a business this is an investment, and will be judged as an investment. Very few businesses will take a dogmatic, ideologically pure view of this. Ask yourself, would you accept 1% loss in fidelity if I gave you a billion dollars? Yes,of course you would. There are no purists in business who will remain in business. We're just haggling over what price/fidelity combination is needed to make a good investment.
However, there is a notable exception to this rule, and that is where access to open document formats are mandated as a public right, not as a business investment. Think of the last 20 years or so of enabling public buildings with ramps for the disabled, bathrooms to accommodate wheelchairs, braille lettering in elevators. This was done by legislation and regulation, as a matter of public policy, to ensure that all of the public has access to public facilities. There was no requirement that an access ramp post a net profit. Similarly, today we see some movements to ODF are based on open access principles.
This is what we call the "fallacy of the excluded middle." You are either with us, or against us, etc. It is false to suggest that the only two approaches to interoperability are to either blindly follow the OpenDocument Foundation's mysterious DaVinci plugin, or to ignore interoperability altogether and advocate rip and replace. There are today two other other ODF plugins available, one from Microsoft and one from Sun. This is real, running code, open source even in the case of the first plugin. So why should we be taking exclusive direction from the Foundation on how we achieve interoperability? Oh right, they are claiming that their program achieves 100% round-trip fidelity. Extraordinary claims...
Gary is in the ballpark when he suspects that Microsoft is seeking some sort of "grand convergence" around protocols and formats. However, I disagree with his impression that OOXML sits at the center of this. In my opinion, OOXML is a rushed, transitional format, intended purely to disrupt ODF adoption. Just as the Office 2000, Office XP, and Office 2003 markup formats were abandoned by Microsoft, I predict that OOXML will soon be cast aside. The problem is that OOXML is such a poorly-engineered format that not even Microsoft wants to build upon this. If I had to divine the future of Microsoft's file formats, I'd look more in the XAML/XPS/Silverlight space. I believe that future MS Office document formats will look more like that than like OOXML.
I find this observation amusing. ODF, which started its standards track late in 2002, was not designed to be 100% compatible with Office 2007. Mercy me, how did we manage to drop the ball on this one?! Remember, in 2002 there was no publicly available specification for Microsoft document formats. There was no Open Specification Promise or Covenant Not to Sue. So not only was 100% compatibility technically impossible, attempting it via reverse engineering was precarious from a legal standpoint. In my opinion, it still is, even in 2007.
In any case I'm staunchly opposed to evolving any open standard purely for the benefit of a single vendor. Microsoft Internet Explorer is the dominate web browser. Should we then require that HTML only evolve in ways that improve interoperability with Internet Explorer? I don't think so. Why should document formats be different?
This comment manages to avoid confronting a heap of contrary facts. Microsoft supports the open source ODF Translator project on SourceForge. Sun has made their own ODF Plugin 1.1 for MS Office available for download. And Novell, along with helping the Microsoft effort, has integrated that translator into their version of OpenOffice and has also started work on more powerful, next-generation support for OOXML. So these three companies are seeking to "limit ODF interoperability and usefulness"? If so, they sure have a clever way of disguising their intent. To the ordinary bystander, writing conversion and translation code to allow documents to be shared between OpenOffice and MS Office would be seen as a pro-interoperability statement. But thanks to the OpenDocument Foundation's in-depth sleuthing, we now know that the opposite is true. Not!
Although I have serious doubts as to long-term technical feasibility of some of these translation endeavors, they do have the advantage of showing real, running code working with real, running applications. They may not claim 100% fidelity, but this is first-generation work and will undoubtedly improve. But they have an important advantage over the Foundation's DaVinci Plugin in that these other efforts demonstrably exist. Given a choice, I'll take an open source version of a partial fidelity convertor, with a reasonable architecture, over one that claims 100% fidelity, but that I can't see or touch.
The claim is that IBM/Google/Oracle also want to "limit ODF interop" because (according to Gary) we want rip & replace. Strange, but just a few weeks ago I lead an ODF Interoperability Camp in Barcelona, on behalf of the OASIS ODF Adoption TC, where we had a good selection of ODF vendors, open source projects and customers working to improve interoperability, including Sun, Novell, Google and IBM. The OpenDocument Foundation is a member of the OASIS ODF Adoption TC. So did they help in the organizing of the event? Did they participate? No, nothing, nada. Evidently it is easier to complain about interoperability than to do something about it.
And again there is this fallacy of the excluded middle. You must either accept the magical DaVinci Plugin, or you are for rip & replace. There are no other alternatives considered. I'd remind the OpenDocument Foundation that interoperability was not invented yesterday, and that there are many technical approaches that can be applied to foster it. Open standards are one way, but there are others that can be applied as well, including conformance testing, test suites, plug-fests, profiles, shared code, reference implementations, etc. We should apply experience and engineering judgment to select the appropriate solution for the problem, and not fall into the trap of believing that there is only a single path to interoperability, and that this path just happens to be based on the Foundation's product.
Although it sure would be nice to portray yourself as the little guy, watching out for the customer, while the big bad vendors tromp all over the flowers, the fact is that the big vendors are actively working on interoperability, with at least three major solutions available today, as well a major initiative around interoperability in the ODF Adoption TC. In particular, IBM (with SmartSuite) and Sun (with StarOffice) have 15 or so years experience each in working on document interoperability with MS Office. This isn't rocket science, but neither is it easy. You can either stand on the sidelines and make pronouncements about how the world is out to prevent interoperability, or you can roll up your sleeves and help get the work done. I know which one I'll be doing. What about you?
Labels: Interoperability, ODF, OpenDocument Foundation
Monday, September 24, 2007
OpenOffice.org Conference 2007
It is interesting to look at FlightStats.com to see how they rate this particular flight. It says that DL 480 has an on-time percentage of 30%, and is excessively late 52% of the time. The average delay for this flight is 79 minutes.
I just don't get it. It is one thing to be slow. But why can't you be slow and still be accurate in your estimates? If you are going to be 79 minutes late on average, then why don't you adjust your schedules accordingly?
In any case, the conference in Barcelona was great! This was my 2nd year attending OOoCon. Last year, in Lyon, I attended OOoCon as an outsider. I remember then being asked by several attendees why IBM was not contributing code to the community and thinking to myself how much it sucked that we were not doing so. What a difference a year makes! Now the discussion is not if IBM will contribute, but the logistics of exactly when and how we will make our contributions. I was proud to attend the Barcelona conference as a real OpenOffice.org member, and I can tell you that the beer tastes better when you are a member of the community.
I gave a presentation called "ODF Interoperability: The Price of Success" on Wednesday. The slides should be posted up here within a few days. A video of the presentation is here. Your best bet is to wait for the slides and follow along with my audio.
On Thursday I lead a full-day workshop on ODF interoperability on behalf of the OASIS ODF Adoption TC. We had participants from a number of ODF vendors/projects: IBM, Sun, Google, Novell, SEPT-Solutions, Haansoft, OpenOffice.org and KOffice. We worked through a few exercises where we tested the exchange of documents that reflected a number of typical real-world business cases. Although they did not attend, we also did some tests with the Clever Age Word Add-in. This event was the first of hopefully several workshops where we will attempt to bring the vendors together in a focused effort to improve ODF interoperability.
There were many good conference sessions that I wanted to attend but missed. That is the downside of having a full day workshop. Of the sessions I did see, the highlights were:
- Louis Suarez-Potts's opening keynote "OpenOffice.org 3.0 and Beyond"
- Hu Cai Yong's impassioned "Beyond Technology, the Chinese Roadmap" on the subtext of Western cultural imperialism embedded in some "one size fits all" commercial software application suites.
- Barbara Held's talk "Toward openness and accessibility" (video available here)
For the ones I missed, I need to go back and watch the taped sessions and read the presentations.
Overall, it was great to see old friends, and meet so many more for the first time, including some with whom I have corresponded with at length, but never before had met in person.
I didn't have much time to play a tourist, so I'll give you only two pictures. The first I've taken from the Ars Aperta website, a picture of Charles Schulz and I exchanging funny stories at the Mac Porting party:

And in the "Maybe My Youth Was Not Misspent" Department comes this picture of a decorative "column" outside the building where I gave my presentation on Wednesday. The building hosts the University of Barcelona's philology department. I immediately recognized the text as Homer and snapped this photo. The next day I was passing when two students were trying to read it. I stopped, and stood, with arms dramatically outstretched, and in my best Greek dactylic hexameter, recited from memory the Invocation to the Muse that begins the Iliad. So, thank you Professor Higbie, wherever you are, for making us memorize Homer. It actually came in use!

Labels: ODF, OpenOffice
Thursday, August 02, 2007
An Invitation: ODF Interoperability Workshop
The hope is that this will be the first of several such events to bring ODF vendors together to explore ways of greater technical coordination, especially in the area of interoperability. I've written about and presented on this topic before. Now is the time for action, and I'm extremely pleased that so many vendors will be attending.
On other occasions I've called interoperability "the price of success" because a standard implemented by only a single vendor and a single application need not worry about it. Only successful standards with many implementations need to rent a hall to bring the implementors together to review and perfect interoperability.
(It is like capital gains taxes. I grumble when I pay them, but take some solace in the fact that my investments were profitable. Those who make a losing investment don't pay capital gains taxes on it.)
The focus of this first interoperability event will be on the ODF word processor format. Follow-up events will look at spreadsheets and presentations.
Please have a look at the detailed agenda for the camp and consider joining us in Barcelona.
Labels: Interoperability, ODF
Sunday, July 29, 2007
My comments on the ETRM 4.0 draft
I’d like to write to you as a long-time Massachusetts resident and taxpayer. My employer (IBM) will likely submit their own comments, but I’d like to offer you my own personal views on the ETRM 4.0 draft.
I am proud of the Commonwealth’s tradition of openness in government, enshrined in our Public Records Law and Open Meeting Law. As James Madison wrote, “A popular government, without popular information, or the means of acquiring it, is but a prologue to a farce or a tragedy. A people who mean to be their own governors must arm themselves with the power which knowledge gives them.” So access to government documents, now and for posterity, is critical for public oversight and participation in government, as well as for preserving our heritage. Now that we’ve moved into the digital age, access to government documents requires that these documents be made available in a format that all Commonwealth residents can read. So the move toward open documents formats, as called for in the ETRM, is laudable. A citizen must never be dependent on any single vendor for the software needed to read their government’s documents.
However, I am concerned at the proposed addition of Ecma Office Open XML (OOXML) to the list of acceptable document formats. As you may have heard, OOXML is currently undergoing review by ISO/IEC JTC1 for possible approval as an ISO standard. As part of this review, technical committees in standards bodies around the world are reviewing OOXML and appraising it’s suitability as an International Standard. As a participant in the US committee reviewing OOXML, INCITS V1, I had the opportunity to review the text of the OOXML specification and to discuss it with others. I am sorry to report that I found the OOXML specification to be full of errors and omissions. Of course, no technical document is perfect. But this one, in particular, is of far greater length (more than 6,000 pages) and of far lower quality than any I have seen before. If it has advanced this far in the ISO process it is because of vendor pressure, not because of technical merit.
What is the problem with a buggy standard? Interoperability suffers. That is the problem. There is no doubt that if everyone in the Commonwealth used Microsoft Office 2007 on Windows Vista, that their interoperability will be good. But as soon as we admit choice in applications and operating systems, then interoperability will only occur when all sides follow a common standard. So the technical quality of a standard (accuracy, comprehensiveness, level of detail, consistency, etc.) is directly proportional to the level of interoperability achievable and the cost to achieve it.
The ISO ballot on OOXML will not end until September 2nd, after which a resolution process to fix defects in the text of the standard will take at least an additional 6-18 months. That is, of course, if OOXML gains ISO approval, something which is not certain at this point. So I would recommend a cautious approach, and wait for the ISO process to conclude, or conduct your own independent technical evaluation of the OOXML specification to confirm its technical quality before adding OOXML to your list. Ask other vendors: Is this something you can implement? Ask yourself: Will this truly give the Commonwealth the interoperability and choice that you desire? These are important questions to ask.
Finally, I’d note that the ETRM also calls out OpenDocument Format (ODF) as an acceptable format. ODF was approved by ISO last year. So why do we need OOXML? I personally think that the complexity of document exchange and translation in a multi-format world would take us back to the confusion and frustration of the early 1990’s when we all juggled WordStar, WordPerfect, Word and WordPro files, and could collaborate only poorly. Better to push for a single unified/harmonized standard document format for personal productivity applications, much as we have a single standard (HTML) for web pages.
I’ll leave you with a quote from Tim Berners-Lee, the inventor of the web, from an interview he gave with David Berlind from ZDNet when Berners-Lee was recently in Boston receiving a Lifetime Achievement Award from the Massachusetts Innovation & Technology Exchange.
Berners-Lee said:
It was the standardization around HTML that allowed the web to take off. It was not only the fact that it is standard, but the fact that it’s open and the fact that it is royalty-free.
So what we saw on top of the web was a huge diversity and different business which are built on top of the web given that it is an open platform.
If HTML had not been free, if it had been proprietary technology, then there would have been the business of actually selling HTML and the competing JTML, LTML, MTML products. Because we wouldn’t have had the open platform, we would have had competition for these various different browser platforms, but we wouldn't have had the web. We wouldn't have had everything growing on top of it.
So I think it very important that as we move on to new spaces ... we must keep the same openness we that had before. We must keep an open internet platform, keep the standards for the presentation languages common and royalty free. So that means, yes, we need standards, because the money, the excitement is not competing over the technology at that level. The excitement is in the businesses and the applications that you built on top of the web platform.
I believe we want to ensure the same qualities in document formats. We want competition and choice among vendors, applications and services, but not among standards. If we compete on standards, then no one wins.
Monday, July 09, 2007
The Formula for Failure
Miguel de Icaza gleefully noted back in January:
OOXML devotes 324 pages of the standard to document the formulas and functions.
The original submission to the ECMA TC45 working group did not have any of this information. Jody Goldberg and Michael Meeks that represented Novell at the TC45 requested the information and it eventually made it into the standards. I consider this a win, and I consider those 324 extra pages a win for everyone (almost half the size of the ODF standard).
And Microsoft's Jean Paoli quoted in May in InfoWorld:
As far as those 6,000 pages of specs is concerned, there are 350 pages in the OpenXML spec alone -- half of the entire ODF spec -- just to describe spreadsheet capabilities, which ODF doesn't have, Paoli says. For example, ODF can't describe or calculate a formula in a spreadsheet.
"It may sound amazing. They are working on it now. But the current standard doesn't have it," Paoli tells me.
There are many other examples, if you care to seek them out. But what you will not find is an examination of what OOXML actually specifies for spreadsheet formulas, or confirmation that it was done sufficiently. Maybe the assumption is that this would be a trivial task, documenting Excel's behavior? What could possibly go wrong?
Let's find out.
First, let's take the trigonometric functions, SIN (Part 4, Section 3.17.7.287), COS (Part 4, Section 3.17.7.50) and TAN (Part 4, Section 3.17.7.313). Hard to mess these up right? Well, what if you fail to state whether their arguments are angle expressed as radians or degrees? Whoops. Same problem for the return value of the inverse functions, ASIN (Part 4, Section 3.17.7.12), ACOS (Part 4, Section 3.17.7.4), ATAN (Part 4, Section 3.17.7.14), and ATAN2 (Part 4, Section 3.17.7.15). It is hard to have interoperable versions of these functions if the units are not specified. What kind of review in Ecma would miss something so simple?
The AVEDEV function (Part 4, Section 3.17.7.17) should return the average deviation of a list of values. However, the formula given for this function is actually for calculating the number of combinations of n things taken k at a time. Nice formula, though. Jakob Bernoulli would be proud. But anyone using an OOXML spreadsheet application that follows this standard will be perplexed at the values returned by their AVEDEV function. Did these formulas get any expert review in Ecma?
It is hard to have confidence in the CONFIDENCE function (Part 4,Section 3.17.7.47). It is said to return the confidence interval around a sample mean given an alpha value, a standard deviation and a sample size. The problem is that this problem is under-defined. One must make an assumption, not stated here, as to the shape of the data distribution. Is it normally distributed data? Exponentially distributed? Weibull distribution? The standard does not define the meaning of this function sufficiently for one to implement it.
The CONVERT function (Part 4, Section 3.17.7.48) converts from one unit to another. Some conversions explicitly allowed include liquid measure conversions such as from liters to cups or tablespoons. But whose cup and whose tablespoon? Traditional liquid measures vary from country to country. In the US, a cup is 8oz, except for FDA labeling purposes when a cup is 240ml. But in Australia a cup is 250ml and in the UK it is 285ml. Similarly a tablespoon has various definitions. OOXML is silent on what assumptions an application should make. I guess I won't be using OOXML to store my recipes, and certainly not to calculate medical doses!
Almost every one of the financial functions in OOXML depends on a "day count basis" flag, such as US (NASD) 30/360, Actual/Actual, Actual/360, Actual/365, European 30/360. These represent various conventions for how days and months are counted. The problem is that the OOXML standard does not define these conventions, nor does it point to an authority for their definition. There are subtle behaviors here, especially when dealing with leap years and Excel's deviant treatment of dates in the year 1900. So lack of detailed definitions in this area make it impossible for anyone to rely on identical financial calculations from different OOXML implementations. This, in a field where being off by a penny can cause problems. Almost 30 spreadsheet functions are broken in this way.
(What do you call a scientist whose calculations are off by 50%? A cosmologist. What do you call an accountant whose calculations are off by 1%? A crook.)
The NETWORKDAYS function (Part 4, Section 3.17.7.344) seems simple enough. It returns the number of workdays (non weekend days) between two dates. Simple enough. Unless you live in the Middle East. The problem is that this function doesn't provide a facility for distinguishing the different weekend conventions. I may have a weekend on Saturday & Sunday, but a colleague in Tel-Aviv might have off Friday and Saturday, while in Cairo it might be Thursday and Friday. This function lacks the adaptability to deal with this important cultural difference. Saying that the definition of the weekend is implementation- or locale-dependent won't work either. I may be a French company in Paris dealing with contractors in Algeria. I need to have a French spreadsheet calculate schedules for workers at various locations and be able to exchange it with others offices using other OOXML applications and expect that they will get the same answer. Lacking cultural adaptability, OOXML fails approximately a billion people here.
Another example. Several of the statistical functions in OOXML are defined incorrectly. Take for example, the ZTEST function (Part 4, Section 3.17.7.352). The key error is following the formula where it says, "where x is the sample mean." The problem is that x-bar is the sample mean, not x. Someone who implements according to the text will give their users the wrong answer. A similar error is repeated in 8 other statistical functions. Certainly this is a typographical error, but this error changes the answer. Remember, this is an approved Ecma Standard and a proposed ISO Standard, not a 4th grade school essay. Denmark and Massachusetts have already said they will adopt OOXML for official business. Spelling counts. Providing the right formula and the right description counts. Copy and paste errors should have been taken care of back during the Ecma review.
I've submitted these spreadsheet formula issues, and many others, to INCITS V1, for consideration in determining the US position on the OOXML ISO ballot, but we never got to them during our two-day meeting in DC a couple of weeks ago, and may not get to them at all. There are simply too many other issues to read through and discuss. But I thought it was important to bring up these formula issues in particular, since Microsoft seems especially proud of their work in this area, delusions of adequacy which on reflection must now seem unwarranted. I'm especially concerned with the financial functions, since they are outside my area of expertise and may have additional errors that I missed.
So what is ODF doing about formulas? We're continuing to work on them. Rather than rush, we're doing careful, methodical work. We're documenting the functions in great detail. Where we have the choice between the common naive formula for a function and one that is numerically stable, we're documenting the stable function. For the NETWORKDAYS function, we created an optional extra parameter, so a user can pass in a flag that tells what their weekend conventions are. We have a professor of statistics reviewing our statistics functions for completeness and accuracy. We're verifying our assumptions about financial functions by referring to core specifications from groups like the ISDA and the NASD. We're creating a huge number of test cases and checking them with Excel and other applications.
Under Sarbanes-Oxley, a CEO or CFO puts himself at personal risk if he signs off on financial numbers derived from processes and tools that he knows to give erroneous results. So we utterly reject a rushed process that has lead to an Ecma Standard which incompletely and incorrectly defines spreadsheet functions. Some things are worth taking the time to do right.
As I've shown, in the rush to write a 6,000 page standard in less than a year, Ecma dropped the ball. OOXML's spreadsheet formula is worse than missing. It has incorrect formulas that, if implemented according to this standard, would raise important health, safety and environmental concerns, aside from the obvious financial risks of a spreadsheet that calculates incorrect results. This standard is seriously messed up. Shame on all those who praised and continue to praise the OOXML formula specification without actually reading it.
Sunday, June 24, 2007
A File Format Timeline

26 June Update
I suppose the downside of a blog post containing only a picture is that there is nothing for anyone to quote. So here are a few themes that struck me while putting this chart together:
- Microsoft once made file format information on the binary formats readily available, in fact encouraged programmers to use the binary formats. But then around 1999 they reversed course, and eliminated such documentation. At the time, working at Lotus, I had no idea what motivated this change. It was only years later, when Microsoft internal memos were released in cases like Comes v. Microsoft, that the full picture emerged. The file format was viewed by Microsoft as a strategic tool, used to support the overall Microsoft platform, not the user. The format was designed to preserve their vendor lock-in. The availability of the file format documentation to competitors was limited, as a matter of corporate policy.
So this reminds us that just because something is documented and available today does not prevent Microsoft from changing their mind at a later point and removing the documentation, failing to update it with new releases, or making it available only under a more restrictive license. Since Ecma owns the OOXML specification, as well as the future maintenance of it, any belief in the long-term openness of this format depends on your trust of Microsoft's future behavior in this area. - Like any durable goods monopoly (and few things are as durable as software) Microsoft's largest competitor is their own install base. Microsoft has made many attempts at moving beyond the binary formats in the past, with Office 2000, Office XP and Office 2003. But in each case it failed. These were all false starts and abandoned attempts. So we should look for signs that OOXML is actually Microsoft's real direction and not another false start or dead end.
My guess is that OOXML is merely a transitional format, much like Windows ME was in the OS space, a temporary hybrid used to ease the transition from 16-bit to the 32-bit platform that would eventually come (Windows 2000). Microsoft doesn't want to support all of the quirks of their legacy formats forever. That just leads to bloated, fragile code, more expensive development and support costs. They would rather have clean, structured markup, like ODF. But the question is, how do you get there? The answer is straightforward: First, eliminate the competition. Second, move users in small steps, promising the comfort of continuity and safety. Third, once you have eliminated competition and have the users on the OOXML format that no one but Microsoft fully understands, then you may have your will of them. For example, introduce a new format that drops support for legacy formats and force everyone to upgrade. They are pretty much doing this already on the Mac by dropping support for VBA in the next version of the Mac Office.
Even a cursory look at OOXML shows that it was not designed for long term use, even by Microsoft. So the question I have is, what is the real format that they are going to? - Microsoft, after pretty much ignoring document standards for over a decade, suddenly got religion in late 2005 and rushed whatever they had on hand into Ecma. Remember, just months earlier they had recommended the Office 2003 Reference Schemas to Massachusetts for official use. I'm certainly glad Massachusetts did not fall for that by putting their resources on another dead format in the Microsoft format graveyard. OOXML was not designed to be a standard. It is just a proprietary specification that Microsoft has dumped, at the last minute, into ISO's lap, in an attempt to translate their market domination into a standards imprimatur in order to further cement their market domination. It is a win-win situation for them. Either they have a effective monopoly in office applications and an ISO standard, or they have an effective monopoly in office applications. Nice situation for them either way. Reminds me a lot of Henry VIII and Clement VII. Henry set himself up to win regardless of what the Pope's response was.
Monday, June 11, 2007
Hemidemisemiquavers
From a GrokLaw news pick we hear that ZDNet's David Berlind recently interviewed Tim Berners-Lee in Boston, where Sir Tim received the Massachusetts Innovation and Technology Exchange's Lifetime Achievement Award. Watch the whole interview if you have 12 minutes, though I will transcribe one passage which highlights the importance of agreeing on a single open standard for a problem domain and fostering competition among the applications built upon that standard:
It was the standardization around HTML that allowed the web to take off. It was not only the fact that it is standard, but the fact that its open and the fact that it is royalty-free.
So what we saw on top of the web was a huge diversity and different business which are built on top of the web given that it is an open platform.
If HTML had not been free, if it had been proprietary technology, then there would have been the business of actually selling HTML and the competing JTML, LTML, MTML products. Because we would''t have had the open platform, we would have had competition for these various different browser platforms, but we wouldn't have had the web. We wouldn't have had everything growing on top of it.
So I think it very important that as we move on to new spaces ... we must keep the same openness we that had before. We must keep an open internet platform, keep the standards for the presentation languages common and royalty free. So that means, yes, we need standards, because the money, the excitement is not competing over the technology at that level. The excitement is in the businesses and the applications that you built on top of the web platform.
Well said. I tried to make a similar point, but with pictures, back in February.
I recently ordered some podcasting equipment. It should arrive tomorrow. I will be looking for people to interview soon. So hide while you can, don't answer the phone, and if it looks like I'm carrying a microphone, then run for the exit.
An interesting article in the American Surveyor, by Joel Leininger, on the importance of file format standards. Although it is a different application domain, the concerns are very similar (via OpenMalaysia).
Anyone know Romanian? Something gives me the impression that this guy from Microsoft Romania is not complementing me. I wonder what subtle hint gives me that impression...
The OOXML ballot marches on in national standards committees around the world. September 2nd is the deadline, though many committees have earlier deadlines for developing their recommendations. In the US the committee looking at OOXML is called INCITS V1, and we have until July 13th. V1 has had a few meetings so far and we're just starting to get into the technical comments. Since we have a consensus process, all it takes is a small minority of members to bring everything to a halt, which is pretty much what is happening. For example, we spent 2 1/2 hours today and discussed only two comments. So we risk having a perfunctory technical review of OOXML. When I compare this to the BSI's excellent work developing detailed comments on a publicly-readable wiki, I think we in the US should be ashamed at the stonewalling going on in V1.
I'll be hosting a V1 face-to-face meeting in a couple weeks in Washington, DC. Hopefully we'll make some more substantial progress there. If you really want to follow our work closely, you can read through our mailing list archives which Sun's Jon Bosak was kind enough to set up for us.
Although no formal call for public comments has gone out, we've received a number of unsolicited pro-OOXML letters which you can read here. As you can see, they are pretty much identical form letters, all ending with the artless phrase, "Furthermore, Open XML in no way contradicts any other international document standard." Remind anyone of the Manchurian Candidate's, "Raymond Shaw is the kindest, bravest, warmest, most wonderful human being I've ever known in my life"?
In any case, if you want to provide input into this process, feel free to send in your thoughts as well. Having read many of these letters myself, I'd offer the following advice:
- Don't send in a form letter. It hurts your cause more than helps it, since it makes it look like you couldn't get real support if you tried.
- Use your real name and email address and postal address, so we know you are a real person and not a robot.
- Be polite. Remember you are trying to persuade.
- Give a succinct, reasoned opinion. Keep it to a page if you can.
- Ask for a specific action. Don't expect the reader to draw a conclusion. Draw it yourself.
Canada continues to solicit comments on OOXML. And the UK is soliciting comments as well, through June 30th. Again, be succinct, and give your name and address. Otherwise you risk having a committee member reject your comment outright since it cannot be ascertained whether you are actually a resident of that country.
A blog I'd like to recommend to my readers is Lodahl's blog. Leif Lodahl has been giving some great coverage of ODF happenings in Denmark, including analysis of the parliamentary debate on the question of whether Denmark should have one or two standards. Also a good catch of Microsoft dancing all over the place, trying to avoid giving a straight answer on why Word does not provide integrated ODF capabilities. If you can spare 45 minutes this is a great clip to listen to.
Tuesday, June 05, 2007
Documents for the Long Term
The permanence of the written word has fascinated mankind for millennia. The powerful knew the truth of this. To be sure that his deeds would outlive his contemporaries, the Emperor Augustus had his CV engraved in bronze in his "Res Gestae Divi Augusti" (Deeds accomplished of the Divine Augustus). The bronze did not survive, but the words have. Horace wrote in his Ode, "Exegi monumentum aere perennius" (I have erected a monument more lasting than brass). And his words have survived. Shakespeare in Sonnet #55 echoed this sentiment, "Not marble, nor the gilded monuments/ Of princes shall outlive this powerful rhyme". Shelly in his Ozymandias shows the irony of the surviving boastful inscription, "Look on my Works ye Mighty, and despair!" beside the "colossal wreck" of an ancient monument.
The saying is "ars longa, vita brevis" — art is long, but life is short. But this is not entirely accurate. The performing arts such as dance or music have a very sketchy and imperfect history until the rather recent invention of written notations. So dance before around 1450 is a matter of speculation. No doubt the ancient Bacchae accompanied their ecstatic revels with an equally furious dance. But we know none of it. Thucidydes has the Lacedamonians march into battle to the accompaniment of flutes. What martial notes they played we do not know. We can only speculate, with Thomas Browne, "What song the Syrens sang". Some like Benjamin Bagby may give a glimpse at earlier performance practice. And scholars like Milman Parry find echoes of ancient practices in traditional story telling. But we cannot know for certain.
The structural arts of architecture, city design, aqueducts, and monuments, engravings, these have all fared better over time. Even scattered texts from antiquity have survived. Text can have longevity, but not unassisted. Left to the ravages of water, fire, insects and fungi, papyrus, vellum and paper will only survive a few hundred years. For a text to survive longer, someone must copy it. So, the works of Cicero, these we have in rather good shape today, in part because Augustine of Hippo praised his works. (Then as now, getting a good review from a recognized figure is is the best marketing).
Which ancient texts were copied, and thus became part of the canon of western literature, was somewhat a matter of chance. Nine of the surviving plays of Euripides, existing in a single partial manuscript, are curiously in alphabetical order, but only containing plays beginning with the Greek letters eta through kappa, leading scholars to believe that this is merely volume 2 of a larger collection of plays that are lost. Euripides is believed to have written almost 100 plays. We have almost 20 of them today.
That said, the survival of a document does not depend entirely on the whims of monks or archivists. There are certain engineering principles which are key to creating a document that lends itself to long term retention. Some of these are tasks for the individual authors:
- Keep a document intact. Better to preserve a document inclusive of annexes and appendicies.
- Separation of content, structure, layout and presentation
- Findability — a good title, a abstract, keywords and other metadata will help ensure that your document can be found and retrieved via current and future search technologies.
- Use of a fully-specified, open document format.
From another angle we can look at archiving from a systems view and follow a basic architectural principle. The key to durability, whether in documents, monuments, institutions, or whatever, all boils down to this: Do not depend on something less stable than yourself.
(I didn't invent that principle, but don't recall where I first heard it. Any idea who it was?)
If you depend on something less stable, which is to say more susceptible to change, than yourself, then when it changes, it forces you to change. Stability is when you change only when you want to change.
For example, a house is built on a foundation. A frame, plumbing and electrical, walls, wallpaper and furniture are layered on top. If replacing the wallpaper triggered a need for a new foundation, then we would say that the house was inherently unstable. But it is reasonable to expect that installing new plumbing will require opening a hole in a wall and later applying wallpaper. The expected rates of change of these various layers has lead to a method of construction that enforces this dependency chain. If for some reason we needed to make very frequent changes to the plumbing, then we would place them outside the interior walls, or behind removable wall panels for each access.
We carefully manage dependency chains when programming as well. For example, imagine a module A (a database client) that depends on a module B (a database server) where you believe that module B is less stable (has a greater rate of change) than A. This is a problem, since changes to B trigger changes to A. So we define a new interface layer C (maybe SQL) that is more stable than A or B. By having A depend on C rather than B directly, we transform the unstable dependency A->B, into the stable relationship (A,B)->C, where C is a standard.
This same principle applies to document formats as well. Never depend on something less stable than yourself. For the first few decades of document formats, the era of binary formats in the 1980's and early 1990's, we did this all wrong, as the following diagram shows:

In those days the file format stood atop a large set of dependencies and changes at all layers would lead to changes in the file formats. This created a very inflexible stack of dependencies, where changes in the less stable lower layers can trigger incompatible changes to the document format. When we see that an Excel file on the Mac has a different internal date format than an Excel file created on Windows, we're are seeing remnants of this kind of dependency chain.
Note also that these interfaces between the layers were not standards, but proprietary interfaces. For example, a Word 95 document might be seen as this:

The move to XML-based file formats changes this diagram but little. The format at the top is now XML but the dependency chains are the same. The relationship of the format to the technology stack has not changed:

If using a new document format requires you to buy a new application suite, update your hardware and buy a new operating system, then that should be a clear sign that something is wrong. "The tail wags the dog," as they say.
And note that a dependency is not the same as a layer. You can pretty things up all you want with the use of standards like XML, but still have adverse dependency chains. Taking a Microsoft Word binary format and translating it into XML, and putting it in a Technical Committee whose charter requires that it remain 100% compatible with Microsoft Word leaves you will a file format that depends on Microsoft Word, no matter now much XML Schema and Dublin Core you throw at it. The XML is just syntactic sugar. But the essence of the dependency chain remains: OOXML depends on Word and Windows, a single vendor's application stack. Instead of an application supporting a format, a format is supporting an application.
I should further note that a vendor, at great expense and effort, can forestall the bad effects of an unstable dependency chain, sometimes for many years. Instability, with effort, can be managed, as jugglers, unicyclists and stilt walkers remind us. Even though the Word binary format has many dependencies on the Windows platform, and on specific internals of Word and features and behaviors from earlier versions of Word, Microsoft has managed to preserve some level of compatibility with these older formats, even in current versions of Word. The support is far from perfect, and it certainly makes their file format and their applications more complicated and more expensive to work with. But that is the burden they face from bad engineering decisions back in the early 1990's. They and their customers live with that, and though they may not realize it, they all pay a price for it.
The alternate approach, the one that leads to better prospects for long term document access, is to have a stack, not of proprietary applications and interfaces, but of standards. ODF's long-term stability and readability comes from the fact that it is built upon, and depends upon other standards that are widely-used, widely-adopted and widely-deployed. ODF is designed so the format depends on things more stable than itself, with a solid foundation as seen here:

The suitability of a format for long term archiving depends as much on the formal structure of the technological dependencies as it does on specific details of the technologies involved. The greatest technologies in the world, if assembled in an unstable dependency arrangement, will lead to an unstable system. Look at the details, certainly, but also step back and look at the big picture. What technology changes can render your documents obsolete? And who controls those technologies? And what economic incentives do they have to trigger a cascade of changes every 5 years, to force upgrades? As consumers and procurers we all need to make a decision as to whether we want to ride on that roller-coaster again.
The question we face today is whether we want to carry forward the mistakes of the past and the extensive and expensive logic required to maintain this inherently unstable duct tape and bailing wire Office format, or whether we move forward to an engineered format that takes into account the best practices in XML design, reuses existing international standards, and is built upon a framework of dependencies that ensures that the format is not hostage to a chain of technologies that can be manipulated by a single vendor for their sole commercial advantage.
Thursday, May 31, 2007
The Legend of the Rat Farmer
The Tale
A long time ago in a land far away there once was a prosperous town called Hamelin. Everything was perfect in Hamelin until the year the rats came. The rats ate up the grain, bit the townsfolk in the toes and scared the young children. Something had to be done! So the Bürgermeister and the Council met together and decided to bring in an outside consultant, Pied Piper Enterprises, LLC. That did not go well. The rats were back the very next year.
So in the Spring the Bürgermeister again assembled the Council and they talked and talked and talked. Should they bring in another consultant? Should they abandon the town and move someplace else? They finally decided on a market-based approach to solving the problem. They would offer a reward, a bounty, to citizens who captured, killed and turned in rats. Turn every person in Hamelin into an exterminator. The signs soon went up all over town: "A Silver Thaler for every 10 Rats."
The Bürgermeister tracked the results on a big chart on the wall of his office and the numbers looked very good. Each day more and more rats were being caught and killed. The citizens were busy at work. The rats would soon all be gone.
But then one day the Bürgermeister went home, and in the doorway of his house was his wife and she was visibly disturbed, "You shall get nothing for dinner tonight! The rats have eaten all of the grain!"
"How can this be?" exclaimed the Bürgermeister. "The metrics show that we're eliminating a record number of rats every day. Come with, and I will show you the chart."
"Chart, schmart. I'll show you some metrics," said the Bürgermeister's wife, who then took him by the ear and lead him around the town center, and at each house they stopped and heard the same tale. The rats are still eating up the grain. They are still biting townsfolk in the toes. They are still scaring the young children.
Nothing at all had improved in the quality of life in Hamelin. All that had changed was that they now had a larger pile of dead rats, and a smaller pile of silver Thalers.
An inquest was held to account for the misuse of town funds. During this investigation it was found that a large percentage of the reward money had gone to one old man who lived by himself on the outskirts of town. The Bürgermeister and the Council went to visit the old man. "How did you manage to catch so many rats?" they asked, "You are old and slow".
"Simple," he said, "Let me show you". He lead them back around his house to a field where stood an old barn. As he opened the barn doors, he revealed to the astonished Council hundreds of small wooden cages, each one holding 10 large rats.
"I don't care for rats much myself", said the old man. "But since you wanted them so much, I thought I could help out a little. After all, I could use the money, and rats are so easy to breed".
"Bu...bu...bu...but we didn't want more rats," stammered the Bürgermeister. "We wanted fewer".
"Nonsense", said the old man. If you offer a reward for something, of course you want more of it, not less. This is just the free market in action."
The Commentary
We see here the results from failing to specify an appropriate metric. As is often the case, we tend to latch on to metrics that are easy to measure, such as counting dead rats, rather than harder to measure, but more appropriate metrics that truly indicate the achievement of our goals. For example, a reasonable metric might have been a "resident satisfaction index" based on a weekly survey of Hamelin's citizen's to see if their rat problems were decreasing. Or the Bürgermeister could have sent out a commission to count how many rats they find in the grain and tracking that number from week to week. The point is to have a metric that clearly and directly reflects the attainment of your goals.
So the lesson is that you should always watch out and ensure that the metrics being suggested truly reflect your ultimate concerns.
With that in mind, let's move forward to the present and what seems to me a similar confusion of metrics.
Jason Matusow, Microsoft's Director of Corporate Standards has written a new blog post, which concludes:
The fact of the matter is that translation between formats has always been the path to interop (for document formats), and now with XML-based formats that path is even more appropriate than ever through translation.
China wants to create its own standardized XML format...translation will enable interop. Google Docs has its own format....translation will enable interop. OpenOffice has ODF..translation will enable interop (to MS Office, to Google Docs, to IBM Workspace). Adobe PDF is its own format...translation will enable interop.
Jason seems to be suggesting that increasing the number of different formats and translators leads to an increase in interoperability. This is akin to saying that increasing the number of umbrellas improves the weather. It just doesn't work that way.
We need to step back and find the proper metric. If, for sake of argument, we define interoperability as the ability for different formats to work together, then obviously as we increase the number of formats and the number of translators then the sum total of interoperability (by that definition) in the world increases. In that case, let's make the old 1-2-3 format an ISO standard, the WordPerfect format an ISO standard, WordStar an ISO standard, XYWrite an ISO standard, Quattro Pro an ISO standard, Manuscript an ISO standard, Harvard Graphics an ISO standard, Freelance Graphics an ISO standard, etc. Just imagine how much interoperability we could have in the world if we simply could standardize more formats. Every application, could have its own standard format, or maybe two or three.
But you may smell a rat in the above argument. Interoperability of formats is not the appropriate metric. A simple look at the lack of OOXML support on the Microsoft's Mac Office shows that the introduction of OOXML has reduced interoperability, not increased it. Similarly, scientific journals like Science and Nature have already come out saying that they cannot accept the OOXML format. Translation among multiple formats only partially and imperfectly attempts to work around a break-down in interoperability caused by having multiple formats. It is a band-aid approach and does not address the core issue.
A more appropriate metric than counting piles of semi-functional translators is to look at things from the perspective of the user exchanging documents. The end user doesn't see or care about formats. They care about their documents and the people and processes that work with these documents. The question for them is: what is the cost to exchange their document with other users and business processes? In other words, what is the cost to interoperate? That is the metric that counts.
Several cost drivers come into play here:
- What are the choices and costs in application software necessary to author a document?
- What are the choices and costs in application software needed by the recipient of this document, in order for them to read it, or collaborate with me in editing this document?
- Will others see the document as I intended? Or will there be fidelity loss from conversions?
- Similarly, what are the performance, security, stability, legal and licensing implications of introducing any translation steps?
- How easy is it to program this document format? In other words, what is the cost of business process integration?
When looked at from this business perspective, we can get away from counting piles of dead rats and thus come to a quite different conclusion:
None of the cost-driver factors lead to reduced costs with multiple formats. They all have minimal costs when there is a a single format in use. So if the metric for interoperability is the "cost to interoperate", then interoperability (and choice as well) is maximized when a single application-neutral and platform-neutral document format is natively supported by multiple applications at a range of price/function points. Introducing even a single additional format into your business will escalate costs, degrade fidelity of document exchange, and reduce interoperability.
Tuesday, May 22, 2007
Interoperability by Design
First, we start by looking at the many ways in which documents are integrated into the Windows/Office platform. Any fluent user of this platform will use many of these capabilities on daily basis. These are basic features which have been around, in some cases, since Windows 3.0, maybe earlier.
Windows shell integration
- Double-click on a document on the Desktop or in a folder and it loads into the appropriate application. Double-click on a Word document and it loads in Word.
- Right-click in a folder and choose “New XXX” to create a new XXX document in the specified folder. So, "New...Microsoft Office Excel Worksheet" creates a new, blank Excel document.
- Right-click on a document, choose Properties and on the Summary tab you can view metadata for that document.
- Recently-edited documents appear in the “My Recent Documents” under the Start menu.
- Documents referred to in web pages, via URL links will render in an inline Office session in the browser.
- Documents are indexed by the Windows search engine.
Office integration
- Ability to File/Open, File/Save and File/New a document via the familiar menu options.
- Ability to set a file format as the default file format for the application.
- Ability to use the familiar keyboard shortcuts, Control-O and Control-S to open and save documents.
- Ability to forward a document to someone in an email and for them to be able to launch the a document by clicking on it when received via email.
- Ability to password protect a document.
- Ability to post a document to a web folder or to a SharePoint server
Instead what we get is a new menu option added to the Word 2007 Office menu:

Note that this is parallel to, but not included in the Open menu where the formats that Word natively understands are accessed. Although the option presented here says, “Open ODF”, it should more properly be called “Import ODF”, for reasons which will be clear shortly.
After selecting an ODF document to open, the following progress bar is given while the conversion takes place:

This is followed by a warning dialog listing elements which may have been lost in conversion:

No option is given for disabling the above message from displaying. It should be noted that when converting from a legacy binary document to OOXML, Word gives a similar conversion warning dialog, but their version can be disabled by checking a "Do not ask me again" dialog.
Once loaded, the user will find that their document is no longer an ODF document. It has been automatically converted to a read-only OOXML DOCX file as the title bar reveals:

So any future operations the user performs on the document, such as mailing, saving, posting to a web server, etc., will be in OOXML format. The only way to get back to an ODF format file is to manually and explicitly go back to the Office menu, go to the ODF submenu and choose to save it to ODF format. At that point you will be presented a default name based on the DOCX temp file name, not the original name. In this case, it suggested “sampler_tmp1.odt”.
The “Save as ODF...” dialog will default to the directory last used to save a file, not necessarily the same as where your document was loaded from. So to save you must first navigate to your original document, select it and choose “yes” when warned about overwriting an existing document, and then the document is converted back into ODF format.
If you do further work on the document in Word, in that same session, and then want to save again, you must avoid the natural tendency to do a Control-S or to save the document when prompted when existing Word. These methods all will lead to a Save As dialog, suggesting an OOXML format, which will prompt you to rename the document since it is read-only. But it will not offer you the choice of saving to ODF format. The only way to ensure that you are saving to ODF format is to use the above steps, going back to the ODF menu, etc.
You cannot create a new ODF document from scratch in Word. If you try to create a new document and save it to ODF format, you will get an error message, telling you that you must first save the document. You must save the document before you can save it? Yes, you must first save it to a temp file in a natively-supported format like DOC before you can save it as ODF.
The difficulties are complicated when you have documents accessed by other means than the Word menus. Imagine that you receive an ODF document in an email which you want to edit and return to the sender. The following steps would be required:
- Manually detach and save your hard drive the ODF document from the email, since you will not be able to launch it directly into Word from your email client. Remember where you detached the document.
- Manually launch Word, since you will not to get Word to launch by clicking on the ODF document you just detached.
- From the ODF menu, choose to open the ODF document. Navigate to where you detached the emailed document and select it. Around 30 seconds later the document will be automatically converted to an read-only temporary OOXML document.
- Make your editing changes.
- Export the document back to ODF format using the ODF menu, either writing over the original file you extracted from the email, or to a new temporary file. Remember where you exported the ODF document to.
- Go back to your email application and attach the ODF document.
If this had been an OOXML document (or any other format that Microsoft really supports, like RTF) it would have been much simpler:
- Double click on the attachment in your email to automatically launch in Word
- Make your editing changes
- Use the Send/Email menu option in Word to send the email
Compare this to the OOXML support Microsoft added for older versions of Word via their Compatibility Pack. The OOXML support is tightly integrated with the UI, in a way users would find familiar and easy to use. But the ODF support is very shallowly integrated, amounting to little more than a menu item patched in.
One wonders if Microsoft's intent was really to annoy users? That would best explain the available evidence. It is simply not credible that anyone at Microsoft believes that they are listening to customers or providing interoperability with a feature that defies real-world use. What customers did they talk to that said that this Add-in was even remotely adequate?
Since Microsoft is the one providing the, "Funding, Architectural & Technical Guidance and Project co-coordination" one would think that they would contribute more in the area where they are uniquely qualified to assist, the full and native integration of the ODF support into Office.
Friday, May 18, 2007
The Funnel and the Wedge
Both formats must evolve their new versions simultaneously in locksteps...
This one is the killer. Trying to have two formats permanently synchronized this way is a maintenance nightmare, especially when we discuss standards with multiple implementations maintained by different organizations.
This is an important point, and bears some reflection. In my mind I have images of the funnel and the wedge, physical means of convergence and divergence. Similar forces are at play with standards.
We see the Funnel in the evolution of HTML. Although the standard has existed for over a decade, implementation support for HTML and related standards was uneven until quite recently. Interoperability was poor. From the start vendors added incompatible extensions while not implementing key features. Developers had to write extensive workarounds and alternative representations to work on all browsers. And when they did not that, their web sites might not work with all browsers. But with customer demand and prodding from groups like the Web Standards Group, the interoperable support of the HTML standard across implementations happened. There was convergence. What we have today, although not perfect, is clearly the result of a Funnel, concentrating industry effort around a single standard (or more like a family of standards).
This does not mean that vendors needed to sacrifice innovation, or deny their customers. It just meant that they accomplished their business objectives while also complying with standards. Along with adhering to various financial and securities regulations, labor law, health and safety, and other requirements, both internal and external, voluntary and mandatory, the browser vendors now complied with web standards. It is just another part of doing business.
Similar Funnels have occurred historically with network protocols, wireless telephony (in Europe at least), electrical grids, broadcast formats, etc. I've written elsewhere about what types of technologies tend to converge like this and why.
We see the Wedge when two standards compete in the same space and diverge into incompatible technologies. Microsoft is the master of the Wedge, with numerous examples over the years, usually proprietary, but more recently attempting to gain de jure recognition of them. But the mechanism is the same in either case: VML, JScript, MS Kerberos, J++, C++/CLI, XPS, and of course OOXML. Standardization just means that Microsoft has another tool for telling you that the Wedge is good for you. But it is a Wedge nevertheless.
The Wedge brings fragmentation, confusion and lack of interoperability, attacking the core reasons for having a standard in the first place. Once the primary value of an open standard is eliminated, we can all return to the security and comfort of our monopolist overlords. That is their main goal. Make no doubt about it, true interoperability and true choice are very scary propositions for Microsoft. It cuts at their very business model.
So consider the Funnel and the Wedge as applied to document formats. If we all use ODF today, is interoperability perfect? No. Do we know how to move forward to improve interoperability, and work together in multi-vendor consortia to perfect this. Yes, certainly. That is why and how such standards as TCP/IP, HTTP or HTML, work today. Interoperability came via the Funnel, a convergence of effort and attention leading to increased interoperability and the user and industry benefits that flow from that interoperability.
But from the Wedge, what can we expect? If Microsoft is successful, here's what I see, my dismal predictions:
- Within 30 days after OOXML is approved by ISO we see the demise of Microsoft's half-hearted attempt to create ODF Add-ins for Office. We'll never see a functional Add-in from them for Excel or PowerPoint, and the Word one will remain unacceptably slow.
- Microsoft will continue to evolve OOXML behind closed doors. 99% of the work will be based on product and decisions in conference rooms in Redmond, which will be later rubber stamped by Ecma and ISO.
- OOXML and ODF will continue to evolve and diverge, in incompatible ways.
- Seeing their success ramming through 6,000 page Fast Track submissions in ISO, Microsoft will follow up with similar fast track submissions for XPS, XAML, Silverlight, Windows Media Photo, whatever they have. Since they have taken the trouble to set up the machinery to dominate JTC1, they will continue to force feed them with additional material.
- Every jurisdiction where ODF is currently allowed and mandated will also allow or mandate the use of OOXML. This in practice will be turned around to mandate the use of Windows and Office.
- Finally, once all opposition is rendered harmless, they can shut down OpenOffice.org and KOffice by patent lawsuits, but keep Novell's version around in order to keep anti-trust regulators away. After all, 97% market share is not the same as 100%.
So do we have an alternative to the Wedge? What would encourage the Funnel? The following would need to happen:
- ISO must reject OOXML.
- Customers, from private and public sectors, must make their voices heard, that they want true interoperability and choice and that this means a single document format.
- Microsoft must support the existing ODF completely and fully in Office. It won't happen overnight. But it won't happen at all unless they start.
- OASIS must work with Microsoft (and Microsoft with OASIS, of course) so that that it is clearly explained how MS Office can fully represent their documents in ODF. This need not be a monolithic monster like OOXML, but should be a layered standard, with a basic core feature set and defined extensions and profiles that encompass wider and wider ranges of functionality. If Microsoft absolutely needs the "heebieJeebies" Art Border in Word in order to maintain 100% fidelity with legacy documents, then the ODF TC can show Microsoft how to encode this in ODF. The Funnel starts when Microsoft abandons their divergent effort in Ecma and joins the common effort around ODF, a single document format for personal productivity applications.
- The application vendors, Microsoft included, must work together on defining the organizational, standards and technical means necessary to measure, test and certify ODF compliance, so customers and procurement agencies are able to have assurances that they are getting the level of interoperability that they desire.
I think this is a natural progression. Accomplishing the first step stops the Wedge from progressing further, halting but not reversing the divergence. The other steps reverse the damage and turn us down a path of true interoperability, leading to true choice and innovation.
Finally note that the Wedge is typically driven by a single company. It is not a pull by public demand or from customers, though it may wear many disguises. It is a deliberate attempt by one party to cause division and divergence. But a Funnel, this won't happen at all unless there is strong demand, from customers, from government agencies, from national standards committees, etc. If this is to happen, your voice must be heard. All of us must work to bring all of us together in this effort. But it takes just one company, with a sufficiently large Wedge to pull us apart.
Thursday, May 10, 2007
So where are all the OOXML documents?
At last count the totals were:
| Format | Count |
|---|---|
| ODT | 85,200 |
| ODS | 20,700 |
| ODP | 43,400 |
| Total ODF | 149,300 |
| DOCX | 471 |
| XLSX | 63 |
| PPTX | 69 |
| Total OOXML | 603 |
As you can see, there is some round-off happening on the upper range. Perhaps at the high-end counts are estimates based on sampling?
In any case, I am rather surprised by the low counts given for OOXML documents, especially considering that this format has been supported since the Office 2007 beta last summer. According to Brian Jones, there have been over 4 million downloads of the OOXML Compatibility Pack for older versions of Office, and that there is a new community of, "over 300 other companies and partners who care deeply about OpenXML". We're also told that Office 2007 sales are above expectations, "two times greater than the purchases of Office 2003" according to one research firm. Recently announced third-Quarter results for Microsoft showed "better than expected" results for Office 2007 sales, $200 million better, according to Microsoft CFO Chris Liddell.
So with all this evident love for Microsoft Office 2007, why is it that 6-months later there are only 63 OOXML spreadsheet documents on the web, something like 0.3% of the number of ODF spreadsheet documents? How can there be 300 companies supporting OOXML and only have 69 OOXML presentations on the web? (This is starting to sound like when I say I support 30 minutes of aerobic exercise a day. I don't do it, but I sure support it!)
OK, I know the argument about "dark matter", that Google indexes only the tip of the iceberg, that there is a lot of data squirreled away on PC hard-drives, behind corporate fire walls, etc., stuff that Google will never see. But the same is equally true for ODF documents, right? I have tons of ODF documents on my laptop, but none of them are indexed by Google.
Of course ODF has been around for a year longer than OOXML. That's an important fact to acknowledge. We can put that in perspective by plotting the graph of ODF and OOXML document counts against the number of days since adoption of these two standards. So ODF counts are based on a start of 1 May 2005 and OOXML starting in 7 December 2006, when OASIS and Ecma respectively approved them. You get this:

As you can see, ODF has a nice upward trend. OOXML is also trending upwards, though it is somewhat lost at this scale. If you do the analysis it comes out to around 300 new ODF documents per day versus 6 for OOXML. So, two years later, ODF adoption, in terms of documents per day, is 50-times greater than OOXML is, at a time which should be OOXML's high-growth period, considering all the great news that is coming out of Redmond.
So I'm a somewhat at a loss to appreciate the significance of Novell or Corel adding OOXML support to their editors. With only 63 OOXML spreadsheets out there, wouldn't it be cheaper just to hire someone to retype the documents in the destination application? The average user is more likely to find a Buffalo Nickel in their lunch change than to find an OOXML document outside of captivity.
Wednesday, April 25, 2007
Math markup marked down
Because of changes Microsoft has made in its recent Word release that are incompatible with our internal workflow, which was built around previous versions of the software, Science cannot at present accept any files in the new .docx format produced through Microsoft Word 2007, either for initial submission or for revision. Users of this release of Word should convert these files to a format compatible with Word 2003 or Word for Macintosh 2004 (or, for initial submission, to a PDF file) before submitting to Science.
Well, so much for 100% compatibility, eh? That is what I've been talking about. Whether you move to OOXML or ODF you will be making a change that will break compatibility with your past document processing systems. You will need to change over the next couple of years and you will need to examine your choices carefully. But don't get suckered into thinking that the choice of OOXML is magically painless. The 100% compatibility claims don't hold water.
More bad news:
Users of Word 2007 should also be aware that equations created with the default equation editor included in Microsoft Word 2007 will be unacceptable in revision, even if the file is converted to a format compatible with earlier versions of Word; this is because conversion will render equations as graphics and prevent electronic printing of equations, and because the default equation editor packaged with Word 2007 -- for reasons that, quite frankly, utterly baffle us -- was not designed to be compatible with MathML. Regrettably, we will be forced to return any revised manuscript created with the Word 2007 default equation editor to authors for re-editing. To get around this, please use the MathType equation editor or the equation editor included in previous versions of Microsoft Word.
Uh oh. Not only cannot you not submit files in OOXML format, but you can't even use Office 2007 and save in the old binary formats. Down conversion or using the Compatibility Pack won't help. Microsoft's decision to push a new "Open Math Markup Language" rather then use the well-established MathML standard appears to be a serious flaw.
Nature appears to have the same problem:
We currently cannot accept files saved in Microsoft Office 2007 formats. Equations and special characters (for example, Greek letters) cannot be edited and are incompatible with Nature's own editing and typesetting programs.Of course, when targeting final publication of a paper, a PDF file is fine. But when engaging in collaboration with another researcher, or an editor, you need to agree of a standard format in which you both can work.
Reuse of existing standards is important. When you reuse a standard, you are reusing more than a piece of paper. You are reusing the experience and effort that went into creating and reviewing that standard. You are reusing the experience gathered by those who have already implemented the standard. You are reusing the books and training materials already written for that standard. You are reusing the interfaces for other technologies that have already integrated with that standard or can produce or consume output that conforms to that standard.
Isaac Newton wrote, "If I have seen further it is by standing on the shoulders of giants". When you reuse standards you reuse the accumulated wisdom of an industry and assume the vision and powers of giants. But when you ignore all precedents and go forth on our own, well, let's just say the outcome is more variable in that case. You may be the next Einstein, or you may be the next fool.
If Science and Nature need to update their templates, then I'd suggest they take a look at ODF. Not only does it use MathML for equations, but it is an open standard, an ISO standard, a platform and application-neutral standard that has many implementation, including several good open source ones. If they need to update their processing, then they might want to make the smart choice now, the choice that increases their choices and flexibility going forward.
18 June 2007 Update
A response from Nature and one of their vendors, explaining the complexity of migrating their publishing ecosystem to a new file format. Quoting a letter to Microsoft from Bruce Rosenblum of Inera:
Had the conversion from DOCX to DOC provided a conversion from OMML to Equation Editor format, it would have provided the necessary backwards compatibility for publishers to upgrade one system at a time. But because this compatibility is not available, it's created the need for a "big bang" upgrade, or a delay until the ecosystem of inter-dependent systems is deliberately updated over time. In the environment of scholarly publishing, such substantive upgrades often take years, not months.
Monday, April 23, 2007
Sometimes I need to remind myself
There is, of course, the familiar canard, that IBM is the source of all of their problems:
It is clear though that Paoli is upset by what he sees as an international campaign against OOXML orchestrated by IBM, the sole naysayer in the ECMA voting. “There are IBM employees going to ISO, and saying a lot of technically incorrect things. When ODF went to ISO Microsoft did not interfere. IBM is betting on ODF, to have governments preferentially buying IBM software. It is OK to compete, but using this kind of argument around is it an open format or not … it’s widely known now, Office Open XML is an open format, even the EU says it is.”
A Google search on the words ecma ibm sole vote returns an embarrassingly large number of hits. Microsoft has certainly been having fun with this line. Let's take a little look at this question and see if we can better define this conspiracy that Paoli is alluding to.
I'm now going to rant a little. You may want to stand back.
Yes, IBM was the only voting member in Ecma who cast a voted against OOXML. But guess what, we're probably the only company who actually had someone perform the due diligence of reading the specification. The others voted on OOXML without reading the spec. So please give their “Yes” votes all the weight they deserve, but not more.
It seems to me that Ecma has become a standards factory, a place where you go for clean, efficient, no-guilt, fast-track service. Don't want to publish your public comments? Fuggetaboutit. Don't want to publish your meeting minutes? Fuggetaboutit. Worried about rushing through a 6,000 page specification in less than a year, with 20x less scrutiny than average? Fuggetaboutit. Want to have a unanimous vote, along with with a souvenir photograph of your face when the vote occurs? Yes sir, we guarantee it.
However, for the privilege of this elite service, you must cough up the dough. You will not find Ecma's rate card on their website, but I'm told that voting membership will set you back $57,000. This is not exactly the club to join if you are a small (or medium) business, non-profit, public sector agency, or anything but one of the big boys. A list of the privileged twenty voting members of Ecma can be found here.
As you can imagine, one does not become a voting member of Ecma without a good reason. This is a business expense, not a charitable contribution. For $57K, one expects $57K of service. To justify that membership fee, you expect your technology to be blessed with an Ecma standards imprimatur without hassles. So the “unwritten rule” is that everyone votes in favor of everyone else's proposal. It is considered rude to vote against something that another elite member has paid so much for. So, IBM gets get a lot of grief for casting a single "No" vote at a single Ecma General Assembly. We broke the club rules. I'm proud to work for such a company.
My question is this: How many “No” votes have been cast in Ecma in the past 5 years? When before did another Ecma member ever vote “No” on a standard? If no one can remember even a single previous “No” vote, or (sacre bleu!) a defeated standard, then that speaks volumes. In a healthy standards body, a single “No” vote should not be a newsworthy event, and should certainly not be something that Microsoft is still complaining about 6 months later.
To put this in perspective, the base category of OASIS voting memberships (Contributor) starts at $1,100. OASIS has something like 330 organizational members eligible to vote, including all categories of companies, government agencies, non-profits, etc.
I should also note, just coming from the annual OASIS Symposium held last week, that the OASIS Board of Directors is looking at changing the OASIS voting rules to make it more difficult for OASIS standards to be approved. Yup, we're raising the bar.
When I see this I need to try extra hard to remind myself that IBM is just interfering with Microsoft's good-faith attempt to humbly submit for our consideration their well-written, detailed, high-quality, interoperable open standard.
ISO/IEC JTC1/SC34 recently had its annual plenary. This is the same group of ISO National Body (NB) members who voted in favor of ODF last year, and over the next few months many of them will be recommending positions on Microsoft's OOXML to their national standards bodies. I was on the delegates list for attending this meeting, as a representative of the US NB, but had to cancel at the last minute because of a family emergency. When I saw the attendance list, I was surprised to see that Microsoft had sent five people, this to a meeting of only 37 people. They practically darkened the skies with their employees. And what about the conspiratorial army that is hounding them at every corner? Zero people from IBM. Zero as well for Google, Sun, RedHat, Adobe, Oracle and Novell.
When I read this I need to remind myself that I'm part of a vast global conspiracy to deny Microsoft a fair hearing within ISO. The fact that no one in this vast global conspiracy managed to show up at the meeting was simply a ploy to make Microsoft feel overconfident.
In the US NB, we have a committee called INCITS V1. It is the mirror committee to JTC1/SC34. I serve on it, the only member from IBM. Imagine my surprise, when at our last call, Microsoft shows up with 3 employees and a business partner as new members. Four people against little ol' me? Come on guys, that is just sad.
At times like this I need to remind myself that Microsoft is the underdog and IBM and its allies are ganging up them. But our guys are invisible at meetings and although they cannot vote, they do have ninja powers and, in matters of external affairs, the delegated plenipotentiary prerogatives of Klingon Ambassadors. “choSuvchugh 'oy'lIj Daghur neH”.
Microsoft bloggers, fed and spreading like mushrooms, recently popped up and simultaneously announced a new pro-OOXML petition, self-published, self-hosted and self-reported by Microsoft. You couldn't find anyone to even pretend to support you? You had to host your own petition? This is like throwing a birthday party and having only your mother show up. Very sad. Where are your friends, Microsoft? How come we hear no one else speaking approvingly about OOXML? Where are the other companies lining up? Where are the endorsements? The testimonials? All we hear is that Microsoft thinks OOXML is great. But that is just Mom cheering on your performance. Don't you have any real support?
Btw, this is what a real petition looks like. It is hosted by a reputable party (the Prime Minister) and gives a open, public listing and tally of those who signed the petition.
At times like this I need to remind myself that the ODF supports are the outsiders in this debate, using unconventional and covert tactics to fight a well-respected and well-loved mainstream technology generously provided by Microsoft.
I see that Microsoft likes to throw around names like the British Library and Library of Congress, as if the mere mention of their holy names brings sacramental blessings. But please show me a public statement where either of these bodies has endorsed, adopted, recommended adoption or recommended approval of OOXML. The mere mention in passing of well-known and popular institutions lends no credibility to your argument, and credible arguments are important, as is well-known to anyone familiar with Walt Disney World, the Louvre, NASA , the Boston Red Sox, or the Department of Really Important Stuff .
A Malaysian standards committee was moving forward to approve ODF as a national standard in Malaysia. This is called “transposing” an International Standard, and is commonly done when a relevant International Standard is approved. Microsoft has made every attempt possible to prevent this committee from making progress with their review of ODF, for almost a year now. This progress recently came to a halt, the committee's decisions nullified and the committee suspended.
When standards committees are disbanded when they get too close to approving ODF, then I must pinch myself and remind myself once again that IBM is the one orchestrating international campaigns against Microsoft, and not the other way around.
I've heard similar complaints from other NB's. Why bother reviewing OOXML? Why waste the effort reading it and suggesting improvements? Microsoft has ignored every suggestion given it so far by NB's. And if you vote no, Microsoft will just escalate and try to get some mid-level government bureaucrat to set aside the recommendation of your country's technical experts. What waste the next 4 months reviewing a 6,000 page specification? It happened in Malaysia. It happened in the US. The INCITS Executive Board was about to send a contradiction submission against OOXML, saying that it possibly contradicted ODF. But before the committee could reconvene the next morning, enough members had received urgent phone calls to cause them to change their vote and abstain. We saw this in the Netherlands as well, where it was even reported in the papers that they would vote against OOXML. But that vote was changed at the last minute with the cryptic message to the JTC1 Secretariat: “The Netherlands Standardization institute votes ‘abstain’. Please change our vote accordingly and please confirm receipt of this vote to me...” What happened there is still unclear. In India it was even worse, when the committee that was supposed to get the ballot did not receive it. Evidently it was misplaced. The intervention of the leader of a major national political party was required to straighten it out. I also received a note saying that the committee was being told that the deadline for responding to the ballot was two weeks later than it really was, a delay that would have invalidated their vote if they had fallen into that trap.
When I see stuff like this happening, I need to remind myself, really, really hard, that IBM is the bad guy in this debate and that we're the one interfering with an orderly ISO process.
When an amendment to a Florida State Senate bill was offered that called for a “business case analysis” for the use of open standard document formats (no particular format was called out) Microsoft's lobbyists, the three Men in Black, Will McKinley of Dutko Poole McKinley, Jim Daughton, Jr. and Geoffrey Becker both of Metz, Hauser, Husband & Daughton, swarmed down and zapped it. As one legislative aide put it, “By the time those lobbyists were done talking, it sounded like ODF (Open Document Format, the free and open format used by OpenOffice.org and other free software) was proprietary and the Microsoft format was the open and free one”. Perhaps a document, left by the lobbyists, filled with lies about ODF, had something to do with it? We should be fortunate that Microsoft sent only three lobbyists to handle this, rather than all nine lobbyists who are registered in Florida alone to support Microsoft's legislative activities.
When expressing our technical opinion defines interference, and the outrages that Microsoft is getting away with become the norms of behavior, then we're all doomed to a future of technical subservience. We all need to remind ourselves of that.
Microsoft likes to complain, and they are evidently becoming quite adept at it. If decibels and dollars could win arguments then they would surely be the winners. But I think their protestations are mis-directed. Microsoft is like an out-of-condition middle-aged man (somewhat like myself) out for a rare jog. They can curse to the high heavens the pain they feel, but don't blame it on others. It is called competition. Deal with it. If it hurts so much it is because you are so out of practice. You should try having competition more often. It is good for you.
Wednesday, March 28, 2007
The ODF Validation Service
Daniel Carrera (OpenDocument Fellowship and the OASIS ODF TC) has a new blog and with it comes news of a new ODF tool, an ODF Validator Service, written as part of the Fellowship's ODF Tools project by Alex Hudson.
It is in the spirit of the W3C's Markup Validation Service: upload a document and get an instant report of whether or not it is valid ODF, and if not, what problems were found. I tried a few documents and it seems to work well.
It would be interesting to see if something like this could be made into a flexible framework for scanning ODF documents, at various levels. Think of a SAX-like call-back parser but at multiple levels of detail. So the framework knows how to fully parse an ODF document and identify features at the Zip and XML level. Plugins to the framework can subscribe to various parse events. So, maybe a ZipListener interface that simply has methods onFile() and onDirectory(). Then a ManifestListener interface that allows you to subscribe to notifications of the data in the manifest. Then within a document, like a spreadsheet, you could have listeners at the structural and content level, so onWorksheet(), onCell(), or in a Wordprocessor document, onTable(), onImage(), etc.
A framework like this could allow you to make a range of applications that need to scan an ODF document and take some action on it.
- A validation service would operate at several levels, validating the structure of the Zip, the manifest as well as validating each of XML's.
- You could also do a cross-platform checker, looking embedded images, and other media, OLE links, etc., and reporting on whether any of these have platform dependencies.
- An accessibility scanner would be able to fit into this framework as well.
- A full text indexer could work here.
- Any number of content scraping applications could work well here.
- If there is some query language interface, this could be useful from a test-generation perspective. If you have a large collection of ODF documents, a developer working on a feature can instantly bring up a set of test documents that can be used to test the code he just changed. Give me a list of word processor documents that have Arabic Bidi text which also have tables. Give me a list of spreadsheets that use pie charts with more than 10 slices.
- With the metadata framework coming in ODF 1.2, there will be even more interesting uses of such a framework.
The benefit of the framework is the reduction in code required to get directly to the info in the ODF document you want, without having to master the ODF specification or writing a lot of parsing code. Think of it as a framework for easy multi-level information extraction from ODF documents.
Change Log
4/11/2007 — Removed parenthetical comment about the need for a privacy policy, since one has now been added to the Validator page.
Labels: ODF
Tuesday, March 20, 2007
Cannibalism
The downside is clear. The minute you move to OOXML you have less choice with whom you can successfully exchange documents with. Office for the Mac, Windows Mobile, WordPerfect Office, Google Docs and Spreadsheets, SmartSuite, ThinkFree Office, users of these products, and the numerous 3rd party applications that can read and write the binary formats, these are now outside of the universe of people and applications that you can exchange documents with. Despite some early attempts from Sun and Novell, Linux users are left out as well.
So why move to OOXML? From the CTO's perspective, if your greatest concern is legacy compatibility, what is the ROI argument for changing file formats? Wouldn't the tendency be to remain where you are?
So the breakdown may happen like this:
- N% of companies put compatibility with legacy documents foremost. A% of these stay on Office/Windows and upgrade to Office 2007/OOXML. B% stay where they are and use the binary formats, and C% move to some combination of ODF and PDF.
- 100-N% make a decision primarily on factors other than 100% fidelity with legacy documents, such as ease of programmability, greater choice and diversity in applications and vendors, etc. X% stay on Office/Windows and upgrade to Office 2007/OOXML. Y% stay where they are and use the binary formats, and Z% move to some combination of ODF and PDF.
It is interesting to speculate on the initial percentages. But note that this is a network effect game, so the percentages will vary over time based on expectations.
Monday, March 19, 2007
ODF Freely Available

Labels: ODF