Open Source

First release of the Apache ODF Toolkit

2012/01/26 By Rob 2 Comments

The Apache ODF Toolkit 0.5 (incubating) release is now available for download. Detailed change notes are also posted. The ODF Toolkit is a Java library for reading, writing and creating ODF documents. It is entirely in Java and does not require that you install a desktop editor like OpenOffice. It operates directly on the file format and is suitable for server-side use, for tasks such as document automation, report generation, information extractions, etc.

As mentioned in a previous post, the Java components from the ODF Toolkit Union have moved over to Apache. Since this open source project was already using the Apache 2.0 license, the work required to achieve our first Apache release was relatively straightforward. The major task was to take the various components of the Toolkit, which were treated as independent projects at the ODF Toolkit Union, and get them to work better together as a single Toolkit, e.g., build together using the same version of the JDK, package them together into a consolidated release bundle. Not rocket science, but it did require some iteration.

We’re starting now to put together a plan for the next release and future releases. Some of the items under consideration include:

Adding document encryption/decryption support
Adding digital signature support
Update to final published ODF 1.2 schema
Update the demo applications
Concurrency testing
Adding support for ODF 1.2’s RDFa/RDF XML semantic metadata feature
Implement ODF 1.2’s OpenFormula spreadsheet formula language
Add high-performance event-driven streaming API, for subset of tasks that can be done efficiently that way
More cookbook examples
More testing and bug fixing

If you are interested in learning more about the ODF Toolkit, you should visit our website. If you have further questions, we have a users list and a development list that you are welcome to join.

If you know some Java and are interested in ODF, I’d encourage you to take a look at this project and consider participating. We are a small, international, welcoming group working on this project, with a strong focus on quality. Come, take a look.

An Invitation to the Apache ODF Toolkit

2011/08/15 By Rob 3 Comments

Perhaps overlooked in all the excitement generated by the move of OpenOffice.org to Apache was the fact that a parallel move is occurring with the ODF Toolkit. A few weeks ago we submitted a proposal to Apache to start a new project based on the Java components that were until then hosted by the ODF Toolkit Union. This was done after consulting with ODF Toolkit community and getting approval from the ODF Toolkit Union’s Steering Committee. This proposal was recently reviewed, voted on and approved by Apache. So now we have the Apache ODF Toolkit project in the Apache Incubator.

So what is this project and what is it good for?

This project consists of Java libraries and tools for working with ODF documents. Not editors, not viewers, not anything with a user interface. These are not end-user tools. These are tools for developers who need to write programs that read, write or manipulate ODF documents. These tools do not require that you have any ODF editor installed. They operate directly on the files. So they are ideal for running on a server, for things like report generation, information extraction, document validation, conversion, etc. We have a page of demos that gives a good idea of the range of things possible with the ODF Toolkit.

The ODF Toolkit is important because it enables innovation on top of ODF. By analogy, look at HTML. At one point, the web consisted mainly of hand-authored documents at a handful of academic and government websites. If that was all there was to the web, it would not have been very interesting. What made the web the platform it is today has been the technologies that enable server-side generation of web pages from database queries, or services that analyze web pages and extract and aggregate information. Google was made possible because HTML was an open standard that could be programmatically understood. PHP was possible because HTML was an open standard that could be written.

ODF, unlike the previous generation of binary document formats, is also an open standard. You can read and write ODF documents freely. But writing the code to understand the nitty-gritty of the ODF format is a considerable task. The ODF Toolkit makes this easy for Java programmers. How easy? Here is a “hello world” text document:

TextDocument doc=TextDocument.newTextDocument();
doc.addParagraph("Hello world!");
doc.save("hello.odt");

Other tasks, like change styles, combining presentations slide decks, searching and replacing text in a document, extracting text from a document are also simple. More examples that give a flavor of the ODF Toolkit are in the “cookbook“.

But along with the “Simple API” the ODF Toolkit has the ODFDOM layer. This layer allows you to get to every part of an ODF document, at the finest grain level. Some tools out there give you only a high level API but then leave you hanging if you want to do something more complicated. Not so with the ODF Toolkit. If you want to drill down and adjust the line spacing of a bullet list in a footnote, then you can do it.

These components enable innovation on top of ODF, innovation that thinks “outside the editors” and “beyond office”.

So how do you get involved? If you want to help with the project then I invite you to sign up on the project’s development mailing list. And if you have questions about using the ODF Toolkit, but don’t want the additional email traffic from the dev list, then you can sign up for the users list. Of course, I’ve signed up for both lists. I hope I’ll see you there!

OpenOffice, LibreOffice and the Scarcity Fallacy

2011/06/13 By Rob

As you’ve probably heard, the proposal to move OpenOffice.org to the Apache Software Foundation was approved by a wide margin. Volunteers interested in helping with this project continued to sign up, even during the 72-hour ballot, giving the project 87 members, as well as 8 experienced Apache mentors, at the end of the vote. The volunteers signed up included an impressive number of programmers from OpenOffice.org, RedOffice and Symphony, as well as QA engineers, translators, education project experts, OOo user forum moderators and admins, marketing project members, documentation leads, etc. The broad range of support for this new project, from volunteers as well as voters, was very encouraging.

Of course, this is not the end of our recruitment effort. In some sense it marks only the beginning. What I wrote about in my previous notes, about the Apache meritocracy remains true. However, now that the proposal has advanced and an Apache “Podling” (a probationary project) has been created, the way to sign up has changed. You should now sign up to the project’s mailing lists directly. For example, an email to ooo-dev-subscribe@incubator.apache.org will get you onto the project’s main dev mailing list. Anyone interested in participating needs to get onto this list, including those who already earlier expressed interest as “proposed committers” as well as new volunteers.

I would be negligent if, in mentioning the successful approval of the Apache OpenOffice proposal, I did not acknowledge that there were other, dissenting, opinions expressed. That is fine and indeed welcome. It is good that we don’t all think the same. However, in order to have a plurality of views, and to give users a plurality of applications to choose from, we also need plurality of projects in the open source world. So it was disappointing to witness a small but vocal minority of non-Apache members who disagreed with the proposal and who attempted to derail it. The day closed minded open source advocates decide to smother a new project in its crib, because they personally favor a different project, is the day that FOSS dies.

I believe that one unstated assumption in their reasoning was that there is a scarcity of developers and a scarcity of users in the personal productivity application area, and that the success of a new project can only come at the expense of another project, in this case at the expense of LibreOffice. The assumption was that we’re playing a zero-sum game, and like junk yard dogs we’re fighting to the death over scraps. In this view (which I believe to be false), as illustrated below, LibreOffice supporters see Apache OpenOffice as a mortal threat to their project, since its gain comes only at their expense.

Of course, this is inaccurate in many ways. For example, the market share of LibreOffice, although strong on Linux, is actually quite low in the much larger Windows platform, where OpenOffice is still the leading open source office suite. So overall, OpenOffice has greater market share than LibreOffice has today.

And in the real world, outside of FOSS blogs, the world runs predominately Microsoft Office, a proprietary set of applications. The other proprietary applications, like Corel WordPerfect and Google Docs and Apple iWork, combined with Microsoft Office represent well over 90% of the market. Open source, of all varieties, including LibreOffice, is rather small.

So rather than fighting over the remaining 5%, I think we should set our sights on a more transformative engagement with the market. This need not be a zero-sum, I-Win/You-Lose situation. OpenOffice and LibreOffice can both win. OpenOffice and LibreOffice and Calligra Suite and AbiWord and Gnumeric can all gain users at the same time. And this can happen at the same time that mixed-source applications based upon OpenOffice also grow and gain users.

There is no scarcity but scarcity in vision.

Apache OpenOffice, with its permissive license, is an excellent basis now for open source as well as mixed source business models, business models that drive investment back into the ecosystem. The mixed source segment will grow the most, I believe. But so will the pure open source version, because of the increased investment. We’ve had LGPL with OpenOffice for 10 years now. We’ve seen the modest success with which business models based on LGPL advanced in this segment of the market. Do we think another 10 years of the same will do much better? Personally, I think it is time, after a decade, to try enabling additional options, things that have not been tried yet.

So rather than the scarcity fallacy, the impact of Apache OpenOffice will be more like the following diagram:

So let’s stop this nonsense, this fallacy of scarcity. Let’s stop fighting over that little 5% box. Instead, let’s look toward how we restore the choice and diversity that we had in this market segment back in 1990, but do it better. We have something now we didn’t have back then, and that is an International Standard for document exchange, ODF. This can and should be the basis for interoperability among competing application suites.

PJ, Goodbye and Good Luck

2011/05/10 By Rob 7 Comments

There was a time when daggers were drawn on Linux and its demise was plotted in dark detail. At that hour stepped out a shieldmaiden with a blog, and that blog was Groklaw. Eight years later, we hear the news that Groklaw will cease new postings after May 16th. My sadness in hearing this news is more than equaled by my gratitude to PJ and her community of researchers and commentators, for their enormous effort and unparalleled achievement over these years. The world is a better place because of PJ. Who can hope to say better?

As a retrospective of a different kind, I’ve taken the titles from every Groklaw article since its start and created a “word cloud” from them, using Wordle. This shows, at a glance, the issues that have dominated the attention of Groklaw over the years.

The Legacy of OpenOffice.org

2010/11/07 By Rob 18 Comments

When I hear the word “fork”, I reach for my gun. OK. Maybe it is not that bad. But in the open source world, “fork” is a loaded term. It can, of course, be an expression of a basic open source freedom. But it can also represent “fighting words”. It is like the way we use the term “regime” for a government we don’t like, or “cult” for a religion we disapprove of. Calling something a “fork” is rarely intended as a compliment.

So I’ll avoid the term “fork” for the remainder of this post and instead talk about the legacy of one notable open source project, OpenOffice.org, which has over the last decade spawned numerous derivative products, some open source, some proprietary, some which fully coordinate with the main project, others which have diverged, some which have prospered and endured for many years, others which did not, some which tried to offer more than OpenOffice, and others which attempted, intentionally, to offer less, some which changed the core code and other which simply added extensions.

If one just read the headlines over the past month one would get the mistaken notion that LibreOffice was the first attempt to take the OpenOffice.org open source code and make a different product from it, or even a separate open source project. This is far from true. There have been many spin-off products/projects, including:

StarOffice (with a history that goes back even further, pre-Sun, to StarDivision)
Symphony
EuroOffice
RedOffice
NeoOffice
PlusOffice
OxygenOffice
PlusOffice
Go-OO
Portable OpenOffice

and, of course, LibreOffice. I’ve tracked down some dates of various releases of these projects and placed them on a time line above. You can click to see a larger version.

So before we ring the death knell for OpenOffice, let’s recognized the potency of this code base, in terms of its ability to spawn new projects. LibreOffice is the latest, but likely not the last example we will see. This is a market where “one size fits all” does not ring true. I’d expect to see different variations on these editors, just as there are different kinds of users, and different markets which use these kinds of tools. Whether you call it a “distribution” or a “fork”, I really don’t care. But I do believe that the only kind of open source project that does not spawn off additional projects like this is a dead project.