• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for ODF

ODF

ODF: Translations and Errata

2008/09/21 By Rob 8 Comments

Although the ODF 1.0 standard was approved several years ago (by OASIS in 2005 and by ISO/IEC in 2006), work on the standard does not cease. Of course, we have work on technical revisions of ODF, in the form of ODF 1.1 and the current work on ODF 1.2. New releases make the news and are talked about at conferences. etc. But also important, though not talked about as much, is the ongoing work on the text of ODF 1.0., in the form of translation and error correction. Even after ODF 1.1 and ODF 1.2 are created, ODF 1.0 continues to be maintained.

Why is translation important? Aside from increasing the number of developers who can read the standard in their native language, translation is a prerequisite in several countries in order to make ODF into a national standard. So translation increases the number of places where ODF support can be an official requirement. So far the ODF 1.0 standard has been translated into Russian, Chinese, Spanish and Portuguese. (There may be others — Let me know if I’ve missed any.)

(Interesting to note the size advantage of ODF compared to OOXML. I’ve heard from one reliable source that to translate OOXML would cost $500,000. This will certainly hamper its ability to be adopted in some parts of the world. ODF, by reusing existing standards, is only 1/10 the size.)

Also in progress is a translation of ODF 1.0 into Japanese. From what I understand, a JISC committee has completed an initial pass of the translation and then passed the translation off to a second committee. This second committee is reviewing the translation and raising any issues where the text is unclear. In some cases this may be caused by a faulty translation. But in other cases errors may be found which were present in the original English text.

That’s the second ongoing activity related to ODF 1.0 — error correction. Although we received most of our comments during the mandated 60-day public review prior to approval as an OASIS Standard, we do continue to get a trickle of comments months and years after publication. Each OASIS TC has their own mailing list for receiving comments. For the ODF TC, the mailing list archives are here. Anyone can subscribe to the comment list and post using the instructions here. The additional complexity in the sign-up procedure compared to your average mailing list is to ensure that all feedback submitted by the public to the list is in accordance with OASIS IPR rules. This helps ensure that ODF remains an open standard, unencumbered by patents.

Although we are only obligated to address comments received during the pre-approval public review period, around a year ago the ODF TC decided to formally record and process all comments received, regardless of when they arrived. So far, from May 2005 to the present, we’ve received around 250 comments. We note each comment in a spreadsheet, along with what ODF versions it pertains to (ODF 1.0, ODF 1.1 or ODF 1.2 draft), what section number the comment concerns, and whether the comment is reporting an editorial error, a technical error, or proposing a new feature. My estimate is that 50% of the comments are feature proposals, 40% are reporting editorial errors, and 10% reporting technical errors.

The preeminent source of comments on ODF 1.0 has been Murata Mokoto, of the Japanese SC34 mirror committee. Murata-san relays to us the defects found during the Japanese translation of ODF. The vast majority of these are editorial errors, mainly typographical or grammatical. But there are a handful of more significant issues found, and we are especially pleased to receive reports of these.

You may recall the old saying, “Every new class of users finds a new set of defects”. Translation of a standard is a laborious process, especially when combined with the additional review step that JISC is engaging in. This has subjected the text of ODF 1.0 to more scrutiny, at a more detailed level, than any typical technical review could provide. So I am appreciative of the detailed comments from JISC, and of the effort made in this translation by them.

My personal aim is to ensure that all of the reported editorial errors are fixed in the ODF 1.2 text, and that any technical flaws are addressed via errata. An errata document (That’s what we call it in OASIS. Others, e.g., ISO, call it “corrigenda”) allows us to make small changes to the ODF 1.0 text to address defects.

But this goal certainly debatable. Why not aim to fix every reported error in ODF 1.0 via published errata? Why knowingly leave even the smallest typographical error in the text? What relative priority should be placed on fixing typographical errors (and others) in ODF 1.0 versus work completing ODF 1.2?

This is entirely at the will of the ODF TC. The combined priorities of the vendors and other interests represented on the committee determine the direction we take. My perception of the expressed interests is that we should address the JISC comments via an errata document, but that the overall priority is on completing the work on ODF 1.2, and not attempting to fix every last instance of subject/verb disagreement or misuse of “A” for “An” in ODF 1.0.

And so our work on the ODF TC follows that priority. I’d estimate that we spend 80% of our time on ODF 1.2 topics and 20% on processing public comments on ODF 1.0/1.1, including those from JISC. We are nearing completion of an official Errata document for ODF 1.0, consisting of fixes to defects reported by JISC. Expect to see a call for public review soon. After that, the TC will continue to review and process public comments from the comment mailing list. If warranted, we are able to issue an updated errata document in the future, to address additional issues as they are reported.

Filed Under: ODF

What is Rick smoking?

2008/07/17 By Rob 11 Comments

Former Microsoft consultant Rick Jelliffe has posted his own particular brand of science fiction/fantasy, this time in his favorite subgenre, a parody of a drug-induced psychosis, where after uneasy slumber Rick awakes in some alternate parallel universe and finds that JTC1/SC34 is open and transparent and OASIS is closed, and decides to write a rambling blog post about it.

If you like unintentional humor, you will enjoy reading Rick’s over-the-top post.

Rick suggests that organizationally JTC1/SC34 is a more participatory environment for developing standards than OASIS.

JTC1’s process, based on National Body voting is both effective … and more genuinely open, because it is impossible to stack either directly or indirecty.

Let’s test that proposition. Let’s compare OASIS and JTC1/SC34.

Who can participate? In OASIS, anyone can participate, from any company, organization, government agency, non-profit corporation in the world. Or you can join as an unaffiliated individual, as many have. You don’t need your government’s permission to join. You just do it. Most join with a nominal membership fee ($300 for individuals) but membership grants are available in some cases, when the fee would be burden for active individual contributors.

What about participation in JTC1/SC34? First, you must be a member of your NB. How do you become a member of your NB? In the US the price is $1,200 and you must be representing a company or organization. Individuals? Sorry, you are not allowed to participate. In other countries the rules vary. In some cases membership is not available at all at any price. You are essentially wait-listed until an opening becomes available. (Sorry, we don’t have enough seats, we heard in Portugal). In some countries, like China, membership is forbidden to native citizens who are employees of foreign subsidiaries in China. In other countries you can’t join at all. It is entirely a government decision. So, good luck joining the NB of Syria, where the constitution has been suspended under emergency rule since 1963. (But somehow they managed to make time to vote on the OOXML ballot. Zimbabwe as well, that paragon of open participation.)

Now, it is entirely possible for a standards organization to appear open, but in practice to be inaccessible. So we must look at the complete cost of participation, not just the initial membership fees.

The OASIS ODF TC does its work entirely on an email list, a wiki, and via weekly phone calls, which are toll-free calls for most participants. I don’t recall there ever being a face-to-face meeting, certainly not so long as I’ve been a member. This use of technology lowers the barrier to participation, so anyone can be effective on the TC if they wish. In particular it makes it easier for those who have day jobs and can only contribute to the mailing list during non-work hours.

What about JTC1/SC34? To participate effectively requires attendance at several international meetings each year (Plenary’s, WG’s, Ad-hocs, BRM’s, etc.), as well as participation at NB meetings. Since many of the participants are representative of large corporations or government agencies, a junket mentality prevails and the meetings are often held in some of the most expensive places in the world: Geneva, Granada, London, Kyoto, Jeju Island, etc.

JTC1 does not allow meeting participation by telephone. Since important votes, are held at these meetings, and no provision is made for remote participation, one cannot effectively participate in JTC1/SC34 without a substantial budget for international travel. Attendance at a single meeting — the DIS 29500 BRM — was $3687.52 for me, and I flew coach and ate cheap. How many standards meetings like that can you as an individual or your small company afford per year?

Further, note the nature of your membership — what can you actually do? Can you vote? In OASIS, it is one person/one vote. In the TC, your vote as an individual with a $300 membership fee is counted exactly the same as my vote representing an OASIS Foundational Sponsor. At the organizational level, it is one company/one vote, and the smallest OASIS member organization has exactly the same vote as the largest.

In JTC1/SC34 however, you typically can’t vote at all. NB’s vote, not individuals, not companies. So your opinion and your wishes are subject to the will of your NB. If your opinion varies from your NB’s, you may not be accredited to attend an international meeting, and even if you are able to attend you may not be allowed to speak your opinions. This extra level of indirection and censorship means that you, as an individual, can do little. And to the extent your NB’s committee is stacked by a single vendor and their partner community, or your NB decides to overrule or ignore its technical committee, or Microsoft calls your head of state to change the NB’s vote, or any of the dozens of other documented shenanigans that recently occurred, your entire membership fee and participation will be an entire waste of time, money and effort.

Membership is OASIS is far more open and inclusive. You join. You discuss. You vote. Period. In JTC1/SC34, you are mired in layers of bureaucracy at the national and international level, in a system crafted by and for the big boys to cut back room deals and manipulate the process to the benefit of large corporations.

(Now that isn’t to say that there are not some individual consultants out there who thrive in the JTC1 environment by mastering its dark, dusty, demon-haunted hallways. Even the largest corporations occasionally have need of this expertise, as Rick and others are quite aware. If JTC1/SC34 were truly open and transparent, such skills would not be needed. You certainly don’t see anyone selling their services to help companies navigate OASIS, do you?)

What about transparency? As Rick demonstrates, OASIS meeting minutes and agenda are all posted and public. So is our mailing list. So are all of our drafts. So is our member and public comments.

But in JTC1/SC34, most of the documents are private, only accessible to SC34 members by password. And then occasionally JTC1 will step in prevent SC34 from releasing their own work , suppressing documents even from their own SC members. There are no public comments to speak of, and member comments on draft standards are secret.

So when you are back from your “trip”, Rick, please let us know again, who wins on openness, participation and transparency?


And for the record, a couple of outright deceptions in Rick’s post:

  • Rick says that there are 80 NB’s, and thousands people participating in JTC1, but only 13 people participating on the ODF TC. This is a particularly inept comparison. Why is he comparing all of JTC1 to a single OASIS TC? If you look at OASIS overall, you will see that OASIS has more than 5,000 participants, representing over 600 organizations and individual members in 100 countries. The ODF TC itself has 53 members, including 7 members of JTC1/SC34.
  • Rick picks a “random” ODF TC minutes post from a year ago to attempt to suggest domination by a single company. Not so random a choice, methinks. It was a rare joint meeting of the ODF TC and the Metadata subcommittee, which brought in a far greater number of Sun employees than typically participate in a call.

Filed Under: FUD, OASIS, ODF, OOXML

Spreadsheet file format performance

2008/05/13 By Rob 19 Comments

I’ve been doing some performance timings of file format support, comparing MS Office and OpenOffice. Most of the results are as expected, but some are surprising, and one in particular is quite disappointing.

But first, a little details of my setup. All timings, done by stopwatch, were from Office 2003 and OpenOffice 2.4.0 running on Windows XP, with all current service packs and patches. The machine is a Lenova T60p, dual-core Intel 2.16 Ghz and 2 GB of RAM. I took all the standard precautions — disk was defragmented, and test files were confirmed as defragmented using contig. No other applications were running and background tasks were all shut down.

For test files, I went back to an old favorite, George Ou’s (at the time with ZDNet) monster 50MB XLS file from his series of tests back in 2005. This file, although very large, is very simple. There are no formulas, indeed no formatting or styles. It is just text and numbers, treating a spreadsheet like a giant data table. So tests of this file will emphasize the raw throughput of the applications. Real world spreadsheets will typically be worse than this due to additional overhead from process styles, formulas, etc.

A test of a single file is not really that interesting. We want to see trends, see patterns. So I made a set of variations on George’s original file, converting it into ODF, XLS and OOXML formats, as well as making scaled down versions of it. In total I made 12 different sized subsets of the original file, ranging down to a 437KB version, and created each file in all three formats. I then tested how long it took to load each file in each of the applications. In the case of MS Office, I installed the current versions of the translators for those formats, the Compatibility Pack for OOXML, and the ODF Add-in for the ODF support.

I find it convenient to report numbers per 100,000 spreadsheet cells. You could equally well use the original XLS spreadsheet size, or the number of rows of data, or any other correlated variable as the ordinate, but values per 100K cells is simple for anyone to understand.

I’ll spare you all the pretty picture. If you want to make some, here is the raw data (CSV format). But I will give some summary observations.

For document sizes, the results are as follows:

  • Binary XLS format = 1,503 KB per 100K cells
  • OOXML format = 491 KB per 100K cells
  • ODF format = 117 KB per 100K cells

So the XML formats are far smaller than the legacy binary format. This is due to the added Zip compression that both XML formats use. Also, note that the ODF files are significantly smaller than the OOXML files, less than 1/4 the size on average. Upon further examination, the XML document representing the ODF content is larger than the corresponding XML in OOXML, as expected, due to its use of longer, more descriptive markup tags. However the ODF XML compresses far better than the OOXML version, enough to overcome its greater verbosity and result in files smaller than OOXML. The compression ratio (original/zipped) for ODF’s content.xml is 87, whereas the compression ratio for OOXML’s sheet1.xml is only 12. We could just mumble something about entropy and walk away, but I think this area could bear further investigation.

Any ideas?

For load time, the times for processing the binary XLS files were:

  • Microsoft Office 2003 = 0.03 seconds per 100K cells
  • OpenOffice 2.4.0 = 0.4 seconds per 100K cells

Not too surprising. These binary formats are optimized for the guts of MS Office. We would expect them to load faster in their native application.

So what about the new XML formats? There has been recent talk about the “Angle Bracket Tax” for XML formats. How bad is it?

  • Microsoft Office 2003 with OOXML = 1.5 seconds per 100K cells
  • OpenOffice 2.4.0 with ODF = 2.7 seconds per 100K cells

For typical sized documents, you probably will not notice the difference. However with the largest documents, like the 16-page, 3-million cells monster sheet, the OOXML document took 40 seconds to load in Office, the ODF sheet took 90 seconds to load in OpenOffice, whereas the XLS binary took less than 2 seconds to load in MS Office.

OK. So what are we missing. Ah, yes, ODF format in MS Office, using their ODF Add-in.

  • Microsoft Office 2003 with ODF, using the ODF Add-in = 74.6 seconds per 100K cells

Yup. You read that right. To put this in perspective, let’s look at a single test file, a 600K cells file, as we load it in the various formats and editors:

  • Microsoft Office 2003 in XLS format = 0.75 seconds
  • OpenOffice 2.4.0 in XLS format = 3.03 seconds
  • Microsoft Office 2003 in OOXML format = 8.28 seconds
  • OpenOffice 2.4.0 in ODF format = 14.09 seconds
  • Microsoft Office 2003 in ODF format = 515.60 seconds

Can someone explain to me why Microsoft Office needs almost 10 minutes to load an ODF file that OpenOffice can load in 14 seconds?

(I was not able to test files larger than this using the ODF Add-in since they all crashed .)

(Update: Since it is the question everyone wants to know, the beta version of OpenOffice 3.0 opens the OOXML version of that file in 49.4 seconds and Sun’s ODF Plugin for Microsoft Office loads this file in 30.03 seconds. )

This is one reason why I think file format translation is a poor engineering approach to interoperability. When OpenOffice wants to read an legacy XLS file, it does not approach the problem by translating the XLS into an ODF document and then loading the ODF file. Instead they simply load the XLS file, via a file filter, into the internal memory model of OpenOffice.

What is a file filter? It is like 1/2 of a translator. Instead of translating from one disk format to another disk format, it simply loads the disk format and maps it into an application-specific memory model that the application logic can operate directly on. This is far more efficient than translation. This is the untold truth that the layperson does not know. But this is how everyone does it. That is how we support formats in SmartSuite. That is how OpenOffice does it. And that is how MS Office does it for the file formats they care about. In fact, that is the way that Novell is now doing it now, since they discovered that the Microsoft approach is doomed to performance hell.

So it is with some amusement that I watch Microsoft and others propose translation as a solution to interoperability, creating reports about translation, even a proposal for a new work item in JTC1/SC34 concerning file format translation, when the single concrete attempt at translation is such an abysmal failure. It may look great on paper, but it is an engineering disaster. What customers need is direct, internal support for ODF in MS Office, via native code, in a file filter, not a translator that takes 10 minutes to load a file.

The astute engineer will agree with the above, but will also feel some discomfort at the numbers. There is more here than can be explained simply by the use of translators versus import filters. That choice might explain a 2x difference in performance. A particularly poor implementation might explain a 5x difference. But none of this explains why MS Office is almost 40x slower in processing ODF files. Being that much slower is hard to do accidentally. Other forces must be at play.

Any ideas?

Filed Under: ODF, OOXML, Performance

Achieving the impossible

2008/05/07 By Rob Leave a Comment

Unadulterated copy of James Clark’s Relax NG validator jing. Unadulterated copy of Kohsuke Kawaguchi’s Sun Multi-Schema Validator msv. Unadulterated copy of the ODF 1.0 Relax NG schema. Unadulterated copy of the ODF 1.0 Standard, in ODF format.

No errors from either validator.

msv is so good as to tell us “the document is valid”. But jing indicates success with only silence. So will I.

Filed Under: ODF

The Challenge

2008/05/05 By Rob 17 Comments

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
office:version="1.0">
<office:body>
<office:text>
<text:p>Dear Alex Brown. Please prove that I am invalid ODF 1.0 (ISO 26300:2006).
I do not think that I am. In fact I think that your statement that there are
no valid ISO ODF documents in the world, and that there cannot be, is a brash,
irresponsible and indefensible piece of bombast that you should retract.</text:p>
<text:p>(Please note that this document contains no ID, IDREF or IDREFS attributes.
Nor does it contain custom content.)</text:p>
</office:text>
</office:body>
</office:document-content>

Filed Under: ODF Tagged With: Alex Brown, ISO/IEC 26300, ODF, OpenDocument Format, XML

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 11
  • Page 12
  • Page 13
  • Page 14
  • Page 15
  • Interim pages omitted …
  • Page 25
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies