Rob

Introducing ISO ODF 1.1

2010/04/01 By Rob 2 Comments

It was cold and dreary in Stockholm for last week’s meeting of ISO/IEC JTC1/SC34. There is nothing surprising or particularly interesting to report. That is a sign of a successful meeting. No drama. These face-to-face meetings tend to be formulaic rituals. The real work is done in WG teleconferences and email discussions that occur in advance of the face-to-face meetings. If the WGs have done their work well, the physical meetings are boring. In Stockholm we were not disappointed.

However, I would like to report on the advancement of an ODF initiative that we’ve been working on in the OASIS ODF TC and in SC34/WG6 for a couple months now. The idea is to “upgrade” the ISO version of ODF (ISO/IEC 26300) so it aligns with OASIS ODF 1.1, rather than its current alignment with ODF 1.0. ODF 1.1 was standardized by OASIS back in 2007 and is widely implemented, including support in OpenOffice, Microsoft Office, Symphony, KOffice and so on.

Formally this alignment to ODF 1.1 would be done via an amendment to ISO/IEC 26300:2006, to add the enhancements from OASIS ODF 1.1 — primarily accessibility improvements. The process will look something like this:

OASIS submits the full text of ODF 1.1 to JTC1 (done)
The ODF Project Editor will work with SC34/WG6 to prepare the text of an amendment to ISO/IEC 26300. Think of it as a diff between ODF 1.0 and ODF 1.1 (in progress)
A ballot of SC4 NBs in what is called an FPDAM (Final Preliminary Draft Amendment)
A ballot of JTC1 NBs in what is called an FDAM (you guessed it — a Final Draft Amendment)

The full process will take 9-12 months, so we can expect the amendment to be published sometime in 2011.

So you may be thinking, what does this mean for ODF 1.2? The answer is: nothing. I was not willing to support this amendment unless it could be done in a way that would not divert the ODF TC from its current work completing ODF 1.2. After quite a bit of discussion in OASIS and with WG6 we found a way to process the amendment that did not effect the ODF 1.2 schedule. However, the result is that it is likely that OASIS ODF 1.2 will be approved before the ODF 1.1 amendment is approved in JTC1. This may look silly, but it is not a serious problem. The important thing to remember is that the latest and greatest ODF work will be found at OASIS, and that this work will slowly but steadily progress through JTC1, and will eventually be published with an ISO/IEC coversheet, 12-16 months later. The ISO version of ODF 1.1 should not matter to implementors, since most are already supporting ODF 1.1 and now moving on to ODF 1.2, but it may be of interest to some adopters.

Document Freedom: How to know when you have it

2010/03/31 By Rob 6 Comments

Today is Document Freedom Day. In the five years since Open Document Format (ODF) first was approved in OASIS we have certainly made progress, but there is still work remaining to be done. How will we know when we have arrived? At what point can we declare victory and say “Free at last”? I think that when we can agree that all of the following statements are true, then at that point we have achieved the substantial benefits of document freedom.

I can create documents on the platform of my choice, using the software of my choice.
I can migrate to another editing environment (application or operating system) without losing high-fidelity access to my existing documents.
I can send my documents to anyone and know that they can read them without requiring the purchase of new software.
I can receive documents from anyone and know that I can read them without requiring the purchase of new software.
I have confidence that the documents I create today can be read and understood, 10, 25 or 50 years from now.
Programmers can write and distribute software that reads and writes documents without paying royalties to anyone.
I have confidence that the document format standard is being evolved in a way that guarantees the above rights equally for all users and vendors.

We’ve made substantial progress on these fronts, but I don’t think we’re there yet. We should celebrate our substantial progress, while at the same time commit ourselves for the remaining work ahead. For example, we still need to improve interoperability. In a few weeks we will have our next ODF Plugfest, in Granada, where ODF implementors will gather for the 3rd time to work together to improve interoperability among their implementations.

Weekly Links #4

2010/03/27 By Rob Leave a Comment

ODFDOM for Java: Simplifying programmatic control of documents and their data, Part 1

“This article is the first in a three-part series and introduces the new Open Document Format (ODF) Document Object Model (DOM) for Java™ along with the ODF Toolkit Union open source community, whose mission is to simplify the programmatic manipulation of documents and their data.”

tags: ODF
ODFDOM 0.8 – The new Release of the OpenDocument Java Library – GullFOSS
“The new version of ODFDOM – the OpenDocument Java library – has been released!Most people might know about ODFDOM, for the others: ODFDOM is an Apache 2 licensed Java library to easily create, access and manipulate the ODF documents.

In biggest feature aside of a more than a dozen patches for ODFDOM 0.8 is the complete revised new ODF table API.
The table is the first feature introducing our new layered design to ease ODF usage.”

tags: ODF, ODFDOM
CeBIT 2010: Recipe for Office Migration
“The city council’s conclusion: ‘We would do it again!’ Schiessl: ‘The office product is a key to independence. Once you’ve solved the office issue, you’re independent of any operating system.’ “

tags: ODF

Posted from Diigo. The rest of my favorite links are here.

Public review of “The State of ODF Interoperability”

2010/03/14 By Rob 3 Comments

The OASIS ODF Interoperability and Conformance TC has as a primary goal to:

Initially and periodically thereafter, to review the current state of conformance and interoperability among a number of ODF implementations; To produce reports on overall trends in conformance and interoperability that note areas of accomplishment as well as areas needing improvement, and to recommend prioritized activities for advancing the state of conformance and interoperability among ODF implementations in general without identifying or commenting on particular implementations;

The initial “State of ODF Interoperability” report has now gone out for public review. It is a baseline report, surveying the context of document interoperability, the sources of interoperability problems as well the ways in which these problems are being addressed. Although it explicitly deals with ODF interoperability, much of the report is equally relevant to any other office document format, XML-based or binary.

If you want to participate in the public review, you can find links to the draft, as well as instructions for submitting comments, in the OASIS announcement of the review.

The New & Improved Microsoft Shuffle

2010/03/06 By Rob 27 Comments

A quick update on my post from last week on the “Microsoft Shuffle“, where I looked at how Microsoft’s “random” browser ballot was far from random.

First, I’d like to thanks those who commented on that post, or sent me notes, offering additional analysis. I think we nailed this one. Within a few days of my report Microsoft updated their JavaScript on the browserchoice.eu website, fixing the error. But more on that in a minute.

Some random observations

Several commenters mentioned that if you search Google for “javascript random array sort” the first link returned will be a JavaScript tutorial that has the same offending code as Microsoft’s algorithm. This is not surprising. As I said in my original post, this is a well-known mistake. But it is no less a mistake. If you use Google Code Search for the query “0.5 – Math.random()” lang:javascript you will find 50 or so other instances of the faulty algorithm. So if anyone else is using this same algorithm, they should evaluate whether it is really sufficiently random for their needs. In some case, such as a children’s game, it might be fine. But know that there are better and faster algorithms available that are not much more complicated to code.

Another thing to note is that the Microsoft Shuffle algorithm is bad enough with 5-elements in the array, but the non-randomness gets more pronounced as you increase the length of the array. Regardless of the size of the array, it appears that on Internet Explorer the 1st element will end up in last place 50% of the time. There are other pronounced patterns as well. You can see this yourself this this test file, which allows you to specify the size of the array as well as the number of iterations. Try a 50-element array for 10,000 iterations to get a good sense of how non-random the results can be.

I used that script to run a large test of 1,000,000 iterations of a 1024-element array. The raw results are here. I took that table, and using R’s image() function produced a rendering of that matrix. You can see here the clear over-representation at some positions, including (in the lower left) the flip of the first position to last place. (I’m not quite satisfied with this rendering. Maybe someone can get a better-looking visualization of this same data.)

Evaluating Microsoft’s new shuffle

Sometime last week — I don’t know the exact date — Microsoft updated the code for the browser choice website with a new random shuffle algorithm. You see the code, in situ, here. The core of it is in this function:

function ArrayShuffle(a)
{
    var d, c, b=a.length;
    while(b)
    {
        c=Math.floor(Math.random()*b);
        d=a[--b];
        a[b]=a[c];
        a[c]=d
     }
}

This looks fine to me. I created a new test driver for this routine, which you can try out here. Aside from being much faster, it is gives much better results. Here is a run with a million iterations:

Raw counts

Position	I.E.	Firefox	Opera	Chrome	Safari
1	199988	200754	199944	199431	199883
2	200320	200016	199838	199752	200074
3	199702	199680	199911	200865	199842
4	200408	200286	199740	199861	199705
5	199582	199264	200567	200091	200496

Fraction of total

Position	I.E.	Firefox	Opera	Chrome	Safari
1	0.2000	0.2008	0.1999	0.1994	0.1999
2	0.2003	0.2000	0.1998	0.1998	0.2001
3	0.1997	0.1997	0.1999	0.2009	0.1998
4	0.2004	0.2003	0.1997	0.1999	0.1997
5	0.1996	0.1993	0.2006	0.2001	0.2005

And the results of the Chi-square test:

X-squared = 18.9593, df = 16, p-value = 0.2708

Final thoughts

In the end I don’t think it is reasonable to expect every programmer to memorize the Fisher-Yates algorithm. These things belong in our standard libraries. But what I would expect every programmer to know is:

That the problem here is one that requires a “random shuffle”. If you don’t know what it is called, then it will be difficult to lookup the known approaches. So this is partially a vocabulary problem. We, as programmers, have a shared vocabulary which we use to describe data structures and algorithms; binary searches, priority heaps, tries, and dozens of other concepts. I don’t blame anyone for not memorizing algorithms, but I would expect a programmer to know what types of algorithms apply to their work.
How to research which algorithm to use in a specific context, including where to find reliable information, how to evaluate the classic trade-offs of time and space, etc. There is almost always more than one way to solve a problem.
That where randomized outputs are needed, the outputs should be statistically tested. I would not expect the average programmer to know how to do a chi-square test, or even to know what one is. But I would expect a mature programmer to know either find this out or seek help.