• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for 2006

Archives for 2006

Four Shorts

2006/08/21 By Rob 2 Comments

I. OpenOffice.org Conference (OOOoCon 2006) in comming up, September 11-13th in Lyon, France. The last day starts with a panel discussion of ODF topics, and follows with a track dedicated to ODF. I’m on at 14:00 with a presentation with the exciting title, “A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML (Ecma International TC45 OOXML WD 1.3)”.

The abstract is:

Two XML office file formats have been pressing upon our attention, the OASIS OpenDocument Format, recently standardized by ISO, and the Draft Ecma Office Open XML. This presentation will review history of each, the process that created them, and examine each format to compare and contrast how they deal with issues such extensibility, modularization, expressivity, performance, reuse of standards, programability, ease of use, and application/OS neutrality.

II. KDE enthusiasts get together two weeks later, in Dublin, for their aKademy 2006. Tuesday the 26th will be OpenDocument Day. I’ll be there, and will give a lighting talk on something, probably related to some ODF programmability API ideas I’ve been having.

III. If you didn’t see it yet, Rick “Schemetron” Jelliffe has an interesting post over at O’Reilly looking at ODF and OOXML documents from the perspective of XML complexity metrics. This is a topic which Rick has done a good deal of work with in the past, so it is interesting to hear what he has to say. Did I see something there about OpenOffice loading documents faster than Office?

IV. The ODF Formula Subcommittee has set up a wiki page on our work defining OpenFormula. A lot of good information is there. This page will be updated with the latest status, so you’ll want to make it the first place to go for the latest info on our progress.

Filed Under: ODF, OOXML Tagged With: KDE, OOoCon, OpenFormula

A Demo: Mathematica, MathML and ODF

2006/08/20 By Rob 6 Comments

Here’s a short tutorial on exchanging MathML between Mathematica and OpenOffice, showing what is possible today, and offering some suggestions for closer integration.

First, start with a new ODF document in OpenOffice. It is often easier to modify an existing document, inheriting its structure and default styles, than to create a new document from scratch. So I believe that a lot of interesting projects with ODF will start with an existing document as a template, and then add or replace content in it.

So, here’s what I made, a simple file with a formula describing the Euclidean metric, our old friend the Pythagorean Theorom. Click the image to load the ODF file.

If you rename the ODF file to a .zip extension, and unzip it, you can see the XML files it contains. Always start with the manifest.xml , for your convenience here, to which I draw your attention to the entry with the type “application/vnd.oasis.opendocument.formula”. This, according to Appendix C of the ODF 1.0 specification, is the registered MIME type of an ODF formula document. So that sounds like what we want. Let’s replace that equation with something else.

So into Mathematica we go. Suppose I want to calculate the indefinite double integral of the Euclidean metric. Why not? This is something I’d rather not do by hand, but I know Mathematica can quickly give me the answer:

Now I really don’t want to retype that result into OpenOffice. So, what can I do? I can use Mathematica’s ExpressionToMathML function to turn the above into MathML. When I do that I get MathML like this.

Let’s see now what happens if I simply drop that content in as a replacement for the original content.xml in the ODF file. Here’s what I get (click the image to open the ODF file):

So we got something, but it is not quite right. I’m seeing some little hollow boxes, usually an indication of an unprintable character. What’s up with this?

A closer look at the XML generated from Mathematica shows that these boxes are being displayed whenever the MathML uses the XML character entities corresponding to section 6.2.4 “Non-Marking Characters” of the MathML specification. This includes things like “InvisibleTimes” which handles cases where adjacency represents multiplication (xy == x*y). Using these characters provides hints to the application that can help it optimize its rendering and editing, but they should not be displayed.

In any case there appears to be a bug in OpenOffice 2.0.3 where it tries to display these characters and finds they don’t map to any printable Unicode character. No big deal, I will enter a bug report on that later. But for now I can easily clean this up by defining a new function in Mathematica, ExpressionToOO, defined as follows:

(Note I didn’t name this “ExpressionToODF”, since strictly speaking the ODF specification allows MathML 2.0, including the non-marking characters. This function is specifically to work around an OpenOffice bug. It outputs valid MathML, simply removing the non-marking characters which OO doesn’t understand.)

So, back to Mathematica, I run ExpressionToOO, grab that XML and inject that XML into the ODF document, and we get the following (click to open the ODF file):

That’s what we want! For those who are interested, the complete Mathematica notebook is here: Session.nb.

As you can see, this isn’t rocket science, though no doubt it may be useful to rocket scientists. Consider this a little “proof of concept”. Real end users will not be going around unzipping ODF documents and copying XML around. There needs to be some additional integration work to make this process simple and joyful. For example:

  1. A Mathematica function that automatically inserts a formula into an ODF document
  2. A OpenOffice add-in that lets the user automatically browser formulas from Mathematica and insert them into the current working document.
  3. Clipboard level exchange of MathML between OpenOffice and Mathematica
  4. An export filter from OpenOffice to export to the XHTML+MathML+SVG profile defined to the W3C. This, combined with Firefox, would provide kickass scientific publishing using open standards and tools.

Note that I’m using here Mathematica just as an example. There are over 100 MathML supporting applications out there, both commercial and open source. I’d be interested in hearing what other ideas people have for workflows involving ODF editors and other tools that work with the standards ODF includes, not just MathML, but SVG, XForms, etc. Let’s demonstrate the value of open standards working together.

Filed Under: ODF

Math You Can’t Use

2006/08/06 By Rob 13 Comments

Summary: In this post I will look at MathML, a web standard for displaying mathematical equations. I will show how well established it is on the web, how it is integrated into ODF, and how Microsoft has decided to go off in another direction with OMML, another “stealth” standard hidden in their 4,000 page Office Open XML specfication, but little mentioned. As I did with my prior analysis of their reliance on the rejected VML specification, I will show why this is a bad thing.


I’ve been reading Math You Can’t Use: Patents, Copyright, and Software a book by Ben Klemens, Guest Scholar at the Brookings Institute. It examines the current state of software patents in U.S. and the abuses thereof. He blends his legal and economic policy background with his insights as a programmer to give a perspective worth hearing. Mind you, I don’t agree with him on many points, and in fact I found the book infuriating at times, but he does make a serious argument and I respect that. In any case I like to have my opinions challenged every now and then. It keeps the mind limber.

Although I am not going to talk about patents and copyrights today, I will steal the title of this book and talk a bit about math, the kind you can use as well as the type you can’t. The topic for today is MathML.

MathML is a web standard from the W3C, an XML vocabulary for representing the structure and content of mathematical expressions. In other words, it represents equations for display, especially complicated expressions with integrals, summations, products, limits and all the Greek you can throw at it.

If you are running Firefox and have installed the math fonts then you can get an idea of its capabilities by loading MathML-enabled pages right now, like this one. If you are running Internet Explorer, then sadly you lack native support for MathML, but a browser plugin is available.

MathML 1.0 dates back to 1999, and has been revised through MathML 2.0 (second edition) in 2003.

There are about 100 implementations of MathML if you count producers, consumers and editors, including the powerful software used by working mathematicians and scientists like Maple and Mathematica.

The W3C has made a special effort to get the various MathML vendors together to evaluate how well they handle MathML and this is reported out in their Implementation and Interoperability Report .

Where MathML is supported natively, such as in Firefox, it will render along with the text, and not merely as an embedded GIF image. So, it will scale to different screen resolutions and print well. In theory, since it is just text markup in the page, it can be indexed by an intelligent search engine, though I am aware of none that do this currently. (Is there any use for a Google search of all web pages that include a 3rd degree polynomial inequality? I wouldn’t want to be the first to say “No”.)

MathML also is the key to enabling better support for mathematics via screen readers and other assistive agents. When a visually impaired user is presented an equation in the form of a GIF or other image format, they are left out. But put the formula in MathML and the possibilities look better. The work is not complete yet, but progress is being made. For example this report from CSUN 2004 and NIDE’s MathML Accessibility Project.

Further innovations are seen at sites like Wolfram’s MathMLCentral where we see web services for creating, displaying, or even integrating MathML expressions, using their Mathematica program as the backend.

For the above, and many other reasons, MathML was the only logical choice for us to use to support equations in OpenDocument Format (ODF). With such a thriving ecosystem of producers and consumers, with support the tools used by academia and industry like Mathematica and Maple, strong support in web browsers like Firefox, with the accessibility initiatives around it, I don’t see how you could argue otherwise. MathML is the way the web does math.

But the choice of MathML is more than just a fashion statement. It has practical significance and enables opportunities for innovative workflows around mathematical document production. If you create an equation block in OpenOffice, it saves the equation as a standalone MathML XML document in the ODT document archive. This makes it very easy to access, read, replace, etc.

We should be thinking about workflows like the following:

  1. Do your complicated calculations in a tool like Mathematica
  2. When you get the final results you want, export it to MathML, for example, using Mathematica’s MathMLForm[ ] function.
  3. Copy the MathML into an ODF document archive
  4. Take the ODF document and complete the prose write-up of the document in OpenOffice
  5. Share the draft with colleagues, review, etc., in the editable ODF format
  6. When ready to publish, export to XHTML with embedded MathML preserved for the equations, and embedded SVG for the charts.
  7. Users can then view in Firefox or Internet Explorer (with extra plugin)

We’re not quite there yet, end to end. Step #6 in particular is not working as I’d expect in OpenOffice 2.03. But you get the idea. There is opportunity for fame glory and perhaps some profit to the person or company who provides an end-to-end mathematical editing and publishing solution based on open standards.

So, in this happy world I’ve described, what is missing? If you guessed “Microsoft Office” then you guessed correctly! Even though MathML is a 7 year-old standard, widely implemented, supported by the leading mathematical tools, the preferred format for publishing math on the web, etc, etc., (the mantra should be familiar), Microsoft has ignored it and instead is pushing forward a new competing format in their Office Open XML (OOXML) specification rushing through Ecma.

The new math markup format is called OMML and you’ve probably never heard of it. You can check Google, you can check Wikipedia, you can check MSDN. You won’t find it. In fact, I’m not even sure what OMML stands for since the acronym is not defined in the spec. But it is there, nestled away in the 4,081 page draft OOXML specification as the markup that “specifies the structures and appearance of equations in the document”, Section 25.1, all 93 pages of it.

OMML is not MathML, though it does the solves the same problem. But if you use OMML, it will not work with Firefox, with Mathematica, with OpenOffice or with any of the other 100 applications that support MathML. OMML works with Office, and that’s it. One door in, no doors out.

Consider that Ecma TC45’s Programme of Work included the goal of:

….enabling the implementation of the Office Open XML Formats by a wide set of tools and platforms in order to foster interoperability across office productivity applications and with line-of-business systems.

How exactly does the OOXML specification foster this interoperability when it ignores relevant web standards like MathML (and SVG and XForms)?

Microsoft’s typical argument is to say that the existing standards are inadequate, that Microsoft users expect more, that they need more features, that this is because they need to deal with billions of documents and trillions of dollars, etc. But this rings hollow when talking about math. An examination of the history of mathematical notation demonstrates, as you may already know, that mathematical notation is not exactly experiencing a high rate-of-change. Equations, as used in math and sciences, for the most part use the same notation they did 100 years ago, and many parts of notation are 200-300 years old. Certainly there is no essential change in notation since 1999, when MathML was created.

Now if Microsoft had merely wanted to create a proprietary format for equations and use that in Word in order to trap their customers onto that platform, then I’d simply say that’s not my concern and I’d blog about my heirloom tomatoes or something else. But when this shows up in a nominally open standard destined for approval by ISO, then this raises my eyebrows a little. The obvious choice would have been to simply reuse MathML. So, why are they creating, and standardizing a whole new math markup language? Are there no standards worth reusing? Will XPS replace PDF, VML replace SVG, Windows Media Photo format replace PNG, OMML replace MathML, and OOXML replace ODF? Let’s say “No” to OMML and “Yes” to MathML, the math you can use.

Filed Under: OOXML

Follow the Leader

2006/08/03 By Rob 13 Comments

David Wheeler, the chair of the OASIS ODF Formula Subcommittee has a good status update on our work defining the details of the expression language and supporting functions used in spreadsheet formulas. I’d also like to point out some cool work by Daniel Carrera, who put together some code that post-processes the OpenFormula specification (in ODF format, of course), extracts the details of the embedded test cases, then automatically generates an ODF spreadsheet file which executes the spreadsheet functions and verifies correct results. This resulting spreadsheet allows an implementation to automatically test their compliance to the spec. This gives us a self-testing specification, a great labor savings, as well as a demonstration of the innovative things you can do with ODF. Details are here.

(I note in passing that although the OASIS ODF TC does all of its working documents in ODF format, the Ecma TC45 does none of its working documents in OOXML. They continue to use the old proprietary Microsoft binary formats as their working format on the TC. A suggestion — If they are unable for some reason to use OOXML, then I encourage TC45 to use ISO ODF. They can then download Daniel’s code to help generate test cases from their spreadsheet formula documentation and this, I promise you, will save implementors a lot of time.)

Of course, malcontents will never be pleased by our progress, and will portray this progress as proof that we are yet imperfect, and therefore not useful. The first point is obvious, but the second is dubious.

Stephen McGibbon’s blog entry of a couple weeks ago seems to be the Urquelle of this particular line of reasoning. Here’s one small quote:

I mentioned that in my opinion, Sun were completely aware that ODF wasn’t sufficiently defined to support spreadsheet interoperability as long ago as February 2005 and that the realpolitik inside OASIS was to take advantage of the EU IDA’s request to standardise by rushing to be first despite knowing the ODF specification was deficient in at least this area.

Read the rest of his article and you’ll walk away with two misconceptions:

  • The lack of a spreadsheet formula definition in a file format documentation is unusual, defective and prevents interoperability
  • Spreadsheet formulas were left out because ODF standardization was rushed, for political reasons

Let’s take a look at each of these in turn.

First, let’s look at the state of the art in spreadsheet file format documentation over the years, with particular attention to how spreadsheet formulas have been documented. As the following table shows, Excel formulas have never been publicly specified, even though Microsoft has been producing file format documentation for various binary, HTML, XHTML and XML Excel formats for over 9 years. It was only after the ODF TC decided to document our spreadsheet formulas and formed a Subcommittee to do so that Ecma TC45 decided to follow. The FUD followed soon after.

Date Format version Formula status
1997 Excel 97 Developers Kit (Microsoft Press, 1997) not defined
ca 1998 MSDN CD’s in this era had Office file format documentation not defined
Jan 1999 Office 2000’s XHTML formats for Excel not defined
May 2001 Office XP’s XMLSS format for spreadsheets not defined
Nov 2003 Office 2003’s XML Schemas not defined
Dec 2005 Microsoft submits initial “base document” to Ecma not defined
January 2006 Ecma TC45’s Working Draft 1.1 not defined
February 2006 The OASIS ODF Formula Subcommittee is formed to add formula definition to the ODF specification
April 2006 Ecma TC45’s Working Draft 1.2 not defined
May 2006 Ecma TC45’s Working Draft 1.3 Mirabile dictu! After 9 years of ignoring it, Microsoft finally decides to start defining their spreadsheet formula language.

So the statement that the lack of a formula language specification is unusual or makes interoperability impossible falls down in the face of 9 years of contrary evidence. Over the years, the industry has managed to have interoperable spreadsheet formulas between different versions of Office as well as between Excel and competing spreadsheets, including 1-2-3, Quattro Pro, OpenOffice, StarOffice, etc., all without ever having a formula specification.

Even though every other spreadsheet file format specification in the past decade failed to document a spreadsheet formula language, the ODF TC knew that we could and should do better. That is why we took the lead and formed a Subcommittee to define, in great detail, with test cases, how spreadsheet formulas, expressions and functions should be interpreted. This is not fixing a problem. This is advancing the state of the art in file format specifications.

They say that imitation is the sincerest form of flattery. If so the ODF community should be blushing with all of this flattery heaped on it. If it wasn’t for the continual market pressure that our innovations bring, Microsoft would never have 1) issued a patent covenant for OOXML, 2)brought OOXML before a standards body, 3) started to document their spreadsheet formula language or 4) started to create an ODF Add-in for Office.

So, then what about the statement that ODF was rushed through the standardization process?

Let’s look at the numbers. Both ODF and OOXML are derived from pre-existing formats . This is not necessarily a bad thing. This is one source of “implementation experience” and this is beneficial to any standard to have this. But only once the “base document” is submitted to a multi-vendor open standards development organization (SDO) does the true work of standardization begin, including deep technical review of the specification to confirm completeness, conciseness, lack of ambiguity, correct use of formal specification language, ensuring platform independence, encourage flexibility and extensibility, etc. So, I’ll start the clock when the base specification is submitted to the SDO, and stop the clock when the SDO approves the standard.

The ODF numbers are clear enough since the 1.0 version is complete. The OOXML numbers require some estimation, since they are not complete, but I’ll justify my estimates this way:

  • The OOXML Working Draft 1.3 is currently 4,081 pages long. At the SC34 meeting in June we were told by the Ecma Secretary General that more material was coming and that this draft was only 2/3 complete. By my calculations, this gives a final size estimate of around 6,000 pages.
  • Predicting the completion date is harder. But we do know that Ecma specifications can only be approved twice a year at Ecma General Assembly which are in June and December. If I were Microsoft I’d really really really want OOXML approved in time for the Office 2007 launch, so I’m predicting Ecma approval will be sought at the December Ecma General Assembly.

Of course I could be wrong on either or both of those estimates, but let’s see where the logic takes us. The following table summarizes the time under standardization as well as the rate of standardization (pages/day) for each specification.

Standard Submitted to SDO Standard issued Days elapsed Standard length Rate of work
ODF 12 Dec 2002 1 May 2005 867 706 pages 0.8 pages/day
OOXML 15 Dec 2005 31 Dec 2006 (est) 381 (est) 6000 pages (est) 15.6 pages/day (est)

Now I ask you, who is rushing? ODF took 2 ½ years to standardize 700 pages. Microsoft is trying to standardize a 6,000 page behemoth in just 1 year. I think the argument that ODF was rushed through under political pressure just doesn’t stand up to even cursory examination. Honestly, I think this FUD is being spread around as a smoke screen to hide the fact that OOXML is the one that is really being rushed.

Filed Under: ODF, OOXML, Standards

Throwing stones at people in glass houses

2006/07/27 By Rob 9 Comments

I work in a house with glass walls. Not literally, of course. The cost to air-condition such a house would be prohibitive. I mean that working on standard in OASIS is a public act, with process transparency and public visibility. The public doesn’t see merely the end-product, or quarterly drafts, they can see (if they are so inclined) every discussion, every disagreement and every decision made by the TC, in near real-time. Our meeting minutes for our TC calls are posted for public inspection. Our mailing list archives, where most of the real work occurs, is there for the public to view. The comments submitted by the public are also available for anyone to read. This information is all archived from when the TC first met back in 2002, all the way to the discussions we’re having today on spreadsheet formula namespaces.

One result of this openness is that it is very easy, trivial even, for our critics to simply read our mailing list, look for a disagreement or discussion of an issue, and repeat our words, usually out of context. Cut & Paste. This is certainly the most efficient way to criticize ODF since it minimizes the amount of thinking required. However, this is a bit tedious, especially when this is applied so asymmetrically, as I shall now explain.

Ecma TC45, the committee producing Office Open XML (OOXML), does not operate in such a transparent manner. They do not have a public mailing list archive. They have not published their meeting minutes. The comments they receive from the public are not open for the public to read. The public has no idea what exactly the TC is working on, what issues they think are critical, whether the TC is in unanimous agreement, whether there is spirited debated or whether Microsoft dominates and determines everything. The fact that they have not yet sent OOXML for an Ecma vote is proof that believe the specification is not yet ready for standardization. But we know no details of what exactly is lacking, what problems are being fixed or, more importantly, what defects are being allowed to remain.

And in this way, the ODF-bashers take advantage of our openness, while holding their deliberations in obscurity. They throw rocks at our glass house while hiding in the shadows.

So this openness at OASIS has an apparent downside. But honestly, I wouldn’t trade it for any alternative. Making a standard, especially one this important, is a privilege, not a right. The public deserves to know how a standard is made, the same way and for the same reasons they deserve to know how legislation is made. I relish this scrutiny because I know it makes us stronger.

Sun’s Simon Phipps has posted his keynote from the recent OSCON conference. The topic was the “Zen of Free” and, among other goodies, Phipps lists 5 requirements for “full support for fully open standards”, of which I quote the 4th, since it states the point better than I have:

…the standard [Phipps here speaking generically and not about any specific standard] should have been created transparently. Just as an open source community looks with concern on a large, monolithic code contribution, so we should be wary of standards created without the opportunity for everyone to participate or, failing that, with a full explanation of every decision that was made in its construction. Without that there’s a chance that it’s designed to mesh with some facility or product that will be used to remove our freedom later.

Another way to attack openness is to do it with legal restrictions. For example, we’re seeing many references to a year-old performance evaluation of an atypical spreadsheet file, and using that to make the ridiculous claim that the ODF format itself is too slow. I’d love to dispute that claim and show it for what it is. I’d love to show that for most common document sizes, ODF documents are actually smaller and faster to load and save than OOXML documents. I’d love to show you all this, but I can’t. Why? Because Microsoft won’t let me. The only implementation of OOXML is the Office 2007 beta, and the End User License Agreement (EULA) has this language:

7. SCOPE OF LICENSE. …You may not disclose the results of any benchmark tests of the software to any third party without Microsoft’s prior written approval

So, our critics can quote benchmark results about ODF running in OpenOffice, but we can’t quote numbers about OOXML running in Office. They can read our mailing lists and quote us discussing ODF issues as we address them, but we cannot even see what they are working on.

What should we make of all this? I suggest that no specification is perfect. That’s why we have version numbers. The question you need to ask yourself is: what leads to a better specification, full and open public discussion and scrutiny? Or something rushed through behind closed doors? You know what the issues with ODF are, and you’ll continue to hear the same small list over and over again. But this is a shrinking list, as the ODF TC experts address these issues. But do you know what the issues with OOXML are, the reasons why Ecma TC45 has not yet put forward their specification as an Ecma standard? What do their experts say when speaking candidly about their specification? The public simply doesn’t not know. Do we assume silence means perfection? I don’t think so.

Filed Under: OASIS, ODF, OOXML

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 5
  • Page 6
  • Page 7
  • Page 8
  • Page 9
  • Page 10
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies