≡ Menu

A Demo: Mathematica, MathML and ODF

Here’s a short tutorial on exchanging MathML between Mathematica and OpenOffice, showing what is possible today, and offering some suggestions for closer integration.

First, start with a new ODF document in OpenOffice. It is often easier to modify an existing document, inheriting its structure and default styles, than to create a new document from scratch. So I believe that a lot of interesting projects with ODF will start with an existing document as a template, and then add or replace content in it.

So, here’s what I made, a simple file with a formula describing the Euclidean metric, our old friend the Pythagorean Theorom. Click the image to load the ODF file.

If you rename the ODF file to a .zip extension, and unzip it, you can see the XML files it contains. Always start with the manifest.xml , for your convenience here, to which I draw your attention to the entry with the type “application/vnd.oasis.opendocument.formula”. This, according to Appendix C of the ODF 1.0 specification, is the registered MIME type of an ODF formula document. So that sounds like what we want. Let’s replace that equation with something else.

So into Mathematica we go. Suppose I want to calculate the indefinite double integral of the Euclidean metric. Why not? This is something I’d rather not do by hand, but I know Mathematica can quickly give me the answer:

Now I really don’t want to retype that result into OpenOffice. So, what can I do? I can use Mathematica’s ExpressionToMathML function to turn the above into MathML. When I do that I get MathML like this.

Let’s see now what happens if I simply drop that content in as a replacement for the original content.xml in the ODF file. Here’s what I get (click the image to open the ODF file):

So we got something, but it is not quite right. I’m seeing some little hollow boxes, usually an indication of an unprintable character. What’s up with this?

A closer look at the XML generated from Mathematica shows that these boxes are being displayed whenever the MathML uses the XML character entities corresponding to section 6.2.4 “Non-Marking Characters” of the MathML specification. This includes things like “InvisibleTimes” which handles cases where adjacency represents multiplication (xy == x*y). Using these characters provides hints to the application that can help it optimize its rendering and editing, but they should not be displayed.

In any case there appears to be a bug in OpenOffice 2.0.3 where it tries to display these characters and finds they don’t map to any printable Unicode character. No big deal, I will enter a bug report on that later. But for now I can easily clean this up by defining a new function in Mathematica, ExpressionToOO, defined as follows:

(Note I didn’t name this “ExpressionToODF”, since strictly speaking the ODF specification allows MathML 2.0, including the non-marking characters. This function is specifically to work around an OpenOffice bug. It outputs valid MathML, simply removing the non-marking characters which OO doesn’t understand.)

So, back to Mathematica, I run ExpressionToOO, grab that XML and inject that XML into the ODF document, and we get the following (click to open the ODF file):

That’s what we want! For those who are interested, the complete Mathematica notebook is here: Session.nb.

As you can see, this isn’t rocket science, though no doubt it may be useful to rocket scientists. Consider this a little “proof of concept”. Real end users will not be going around unzipping ODF documents and copying XML around. There needs to be some additional integration work to make this process simple and joyful. For example:

  1. A Mathematica function that automatically inserts a formula into an ODF document
  2. A OpenOffice add-in that lets the user automatically browser formulas from Mathematica and insert them into the current working document.
  3. Clipboard level exchange of MathML between OpenOffice and Mathematica
  4. An export filter from OpenOffice to export to the XHTML+MathML+SVG profile defined to the W3C. This, combined with Firefox, would provide kickass scientific publishing using open standards and tools.

Note that I’m using here Mathematica just as an example. There are over 100 MathML supporting applications out there, both commercial and open source. I’d be interested in hearing what other ideas people have for workflows involving ODF editors and other tools that work with the standards ODF includes, not just MathML, but SVG, XForms, etc. Let’s demonstrate the value of open standards working together.

{ 6 comments… add one }
  • Anonymous 2006/08/21, 05:52

    I am not really sure this is a very good example of why Microsoft made a wrong choice with their implementation of Word 2007…

    I tried to replicate your experiment with Mathematica and Word 2007. Here are my steps:

    1. Type the math expression in Mathematica, and calculate the result
    2. Mark the resulting expression with the mouse, right click and pick Copy as->MathML
    3. Open Word 2007
    4. Click Paste

    Done, everything is formatted beautifully, no error, no nothing.

    The scenario you described is a valuable, i.e. transfering equations from a mathematical app into a document writing app. It happens all the time in science when you write a paper or anything like that. But quite frankely, the implementation MS provides with Word 2007 does all I would hope for as a user. It works with MathML, it is quick and painless.

    And it is actually easier and more straighforward than doing the same with the leading implementation of the ODF standard. I know this is not a problem of the standard but with OpenOffice. But at the same time, as a user I don’t really care. At this point, the Word 2007 Beta 2 has a smoother workflow for me.

    Maybe you can come up with a better scenario showing why using MathML is important within the word processing file format?

  • Anonymous 2006/08/21, 05:58

    Oh, and isn’t there another problem with the example you posted? The expression in Mathematica looks quite different from what you seem to get in OpenOffice. Now, this is probably again a bug in OpenOffice, but this is actually a terrible example, since somehow what ends up in OpenOffice looks VERY different from what you had in Mathematica. Minimally it is swapping around parts of the expression, or is it changing more? Too lazy to try, actually ;)

  • Rob 2006/08/21, 07:31

    Sure you can cut and paste from Mathematica into Word, but what ends up in Word is not MathML, is it? Similarly, you could paste the formula directly into OpenOffice as a bitmap. Perfect fidelity, but it is not a format that lends itself to downstream processing by other tools.

    Remember that Word uses a proprietary XML vocabulary tied to Word for formulas which no one in the world uses except Microsoft, compared to 100+ applications for MathML.

    To the second post (same person?) if you look at the MathML exported by Mathematica, it is clear that OpenOffice correctly rendered the order of terms it was given. I have no idea why Mathematica ordered the terms differently in the export. I notice that copying MathML to the clipboard gives the StandardForm ordering of the terms which is what I’d expect. ExpressionToMathML[] does allow a number of optional “Conversion Option” parameters. Perhaps one of them controls this aspect of the export?

    I’d love to see MathML clipboard support in OpenOffice. That would be a good thing. I’d also like to see full support for MathML 2.0. This requires work, but only work. When it is done, then what? OpenOffice will then have the feature, but Word will still be locking users into proprietary formats on a single platform. I think time is on our side.

  • Jason Harris 2006/08/21, 14:15

    Hello,

    The rearrangement of terms is due to the fact that on export we choose to use “traditional form”. TraditionalForm uses more “traditional” notation for some of the Mathematical terms. Eg evaluate Abs[x] in Mathematica, then evaluate Abs[x] // TraditionalForm and for the latter you will get something which displays like |x| instead of the former Abs[x]. Another byproduct of traditional form is you sometimes get “prettier” term orderings.

    The use of things like Invisible Times, are used to preserve more of the semantics of the underlying specification and can also make the rending of the mathematics slightly more faithful to the underlying expression. see http://www.w3.org/TR/MathML2/chapter3.html#id.3.2.5.5

    Yours,

    Jason Harris
    Wolfram Research
    jasonh at wolfram dot com

  • Anonymous 2006/08/22, 07:31

    Yes, both posts were from me.

    I see that the change of formatting is more an artefact of Mathematica and not related to either ODF or Openoffice. Maybe you want to point that out in the blog posting? Others might get confused too.

    “Similarly, you could paste the formula directly into OpenOffice as a bitmap. Perfect fidelity, but it is not a format that lends itself to downstream processing by other tools.”

    No, this is not similar to a bitmap at all. The equation ends up as an equation in my Word document. I can edit it with Word’s build in equation editor, for example. I can also select the equation in Word, copy it to the clipboard and then paste it into other apps that understand MathML. Well, actually I couldn’t get this copying to the clipboard to work with Beta 2, but that is how I understood Brian’s post, so I assume we will see that in later builds.

    I just feel that the scenario you picked to show the benefits of ODF and embedding MathML in it doesn’t convince me. The scenario “take equation from Mathematica and put it into Word” works just fine with Word 2007. You mention further down stream processing, but maybe you can spell such a scenario out and show how that works in detail? Because, honestly, as a scientist who is working on scientific papers all the time, I just can’t think of anything else I would want to process further down…

    And finally, I think I actually prefer exchanging equations via the clipboard over the scenario where the 100+ apps that support MathML open my word document and write into it directly. That seems to imply first a very complicated work flow, and second it seems ripe for instability.

    Why?

    On 1): So, when I want to exchange anything with another app, I would have to close down the document in my word editor, open the doc in say Mathematica, exchange the equation, close it again, and open it in my word editor. Doesn’t sound very convincing to me ;) Honestly, I believe for the “I write a scientific paper with equations” scenario, exchange via the clipboard is fine. And, as Word 2007 shows, one doesn’t need MathML in the doc format to enable that. So, as a user, I really don’t care.

    On 2): I would feel terribly insecure, if any program I use to manipulate my equations would read and write to the file that has my main document in it. ONE tiny little bug in any of these apps and my doc might be messed up. Not really a good idea. Exchange via the clipboard seems much more stable to me…

    So, essentially I don’t find the scenario you desribed convincing that MathML should be used within a document format. You mention benefits for futher downstream processing, but I don’t really understand what you mean by that. Maybe you can write another piece where you outline a concrete example, with the same detailed step-by-step description you did for this post? That would help immensly!

    Best,
    David

  • Rob 2006/08/22, 08:41

    David,

    Fair enough, I’ll do a write-up on what I mean by “down-stream processing”. We’ve been locked into opaque, black-box proprietary formats for so long that it sometimes is not obvious to an end-user why open formats are beneficial, what additional modes of operation, workflows and collaboration they unlock. I’ll try to make that case.

    Your hesitation to let a small program manipulate your document, from concerns over it corrupting the document, this is a good example of the way we’ve all been taught to think about documents. Modifying a Word DOC file, binary, undocumented, obscure, is certainly a perilous undertaking. We need to discover and relearn, that in the world of open, documented, standardized XML formats, this fear will usually be unfounded. I’ll explain why in a later post.

    There had been talk on another blog that ODF was not really using MathML, or that it was not “real” MathML, etc. The purpose of the original post was to demonstrate ODF in fact uses MathML and that real MathML from another application can be transferred over to it. My post in no way demonstrates how I think a real end-user would move MathML around. We’ll need to leave it to the application and tool vendors to make this a joyful experience for users.

Leave a Comment

Next post:

Previous post: