Archives for November 2010

Invitation: Join the “openstandards” sub-reddit

2010/11/10 By Rob 4 Comments

For a couple of years I’ve been trying to find a good way to share and discuss news, articles, blog posts, etc., about open standards. I’m not very pleased with the results.

I’ve tried putting out weekly links of relevant articles on my blog. Although this is semi-automated, it is severely limited, since only has links that I know about. But I know that my readers, collectively, know far more.

I’ve tried a “planet” aggregator of stories related to ODF, from blogs, new articles and Twitter. However, the signal/noise ratio for Planet ODF is rather low. There are limits to what can be easily collected by regular expression searches, and “ODF” is a particular hard one, since along with meaning Open Document Format, it also means Organ Donation Foundation, Oregon Department of Forestry, Open Defecation Free, and Ordem da Fénix (the name of a Harry Potter novel, in Portuguese). I’ve supplemented the logic with custom rules to, for example, reject content that mentioned both ODF and “brush fires”, but that is a fragile approach. Short of applying Bayesian learning techniques or NLP, I don’t think this dog will hunt.

Now, I suppose I could just give up and go over to TalkStandards.com and be instructed on open standards by the European counsel for the Business Software Alliance and read commentary by other members of the Microsoft claque. But if you know me, you know that won’t happen.

So, I’m trying something new, a “sub-reddit” dedicated to sharing and discussing links related to open standards. This includes areas touching on open standards from the policy, adoption, economic and legal angles, as well as news reports, technical discussions, etc.

http://www.reddit.com/r/openstandards/

I assume most of you are familiar with reddit. It is a social-bookmarking service where you share links and can have a threaded discussion for each link as well as vote each link up or down. Anyone can post links. Anyone can vote links up or down. Anyone can comment on links. With sufficient participation the most interesting links and discussions rise to the top. However, if you are new to reddit, I’d recommend that you:

Go to the reddit page and register for a free account.
Go to the openstandards sub-reddit and click the “+frontpage” link to add this sub-reddit to your reddit front page.
Install a bookmarklet to make it easier to submit new links.

So please take a look and share a few of your favorite links. They could be new stories or old favorites that you think are worth another look. These could be your own original content or links to good content from elsewhere. I’ve seeded it with a few articles that have caught my eye in recent days. I’m hoping this approach will give us something with a higher signal/noise ratio that is also more timely, open and interactive. Enjoy!

The Legacy of OpenOffice.org

2010/11/07 By Rob 18 Comments

When I hear the word “fork”, I reach for my gun. OK. Maybe it is not that bad. But in the open source world, “fork” is a loaded term. It can, of course, be an expression of a basic open source freedom. But it can also represent “fighting words”. It is like the way we use the term “regime” for a government we don’t like, or “cult” for a religion we disapprove of. Calling something a “fork” is rarely intended as a compliment.

So I’ll avoid the term “fork” for the remainder of this post and instead talk about the legacy of one notable open source project, OpenOffice.org, which has over the last decade spawned numerous derivative products, some open source, some proprietary, some which fully coordinate with the main project, others which have diverged, some which have prospered and endured for many years, others which did not, some which tried to offer more than OpenOffice, and others which attempted, intentionally, to offer less, some which changed the core code and other which simply added extensions.

If one just read the headlines over the past month one would get the mistaken notion that LibreOffice was the first attempt to take the OpenOffice.org open source code and make a different product from it, or even a separate open source project. This is far from true. There have been many spin-off products/projects, including:

StarOffice (with a history that goes back even further, pre-Sun, to StarDivision)
Symphony
EuroOffice
RedOffice
NeoOffice
PlusOffice
OxygenOffice
PlusOffice
Go-OO
Portable OpenOffice

and, of course, LibreOffice. I’ve tracked down some dates of various releases of these projects and placed them on a time line above. You can click to see a larger version.

So before we ring the death knell for OpenOffice, let’s recognized the potency of this code base, in terms of its ability to spawn new projects. LibreOffice is the latest, but likely not the last example we will see. This is a market where “one size fits all” does not ring true. I’d expect to see different variations on these editors, just as there are different kinds of users, and different markets which use these kinds of tools. Whether you call it a “distribution” or a “fork”, I really don’t care. But I do believe that the only kind of open source project that does not spawn off additional projects like this is a dead project.

Introducing: the Simple Java API for ODF

2010/11/01 By Rob Leave a Comment

The Announcement

The first public release of the new Simple Java API for ODF is now available for download. This API radically simplifies common document automation tasks, allowing you to perform tasks in a few lines of code that would require hundreds if you were manipulating the ODF XML directly.

The Simple API is part of the ODF Toolkit Union open source community and is available under the Apache 2.0 license. JavaDoc, demonstration code and a “Cookbook” are also available on the project’s website.

The Background

I first proposed an ODF Toolkit back in 2006, shortly after I got involved with ODF. It was clear then that one of the big advantages of ODF, compared to proprietary binary formats, is that ODF lent itself to manipulation using common, high level tools. I made a list of the top 2o document-based “patterns of use“, but the key ones are in the following areas:

Mail merge style field replacement
Combining documents fragments/document assembly
Data-drive document generation
Information extraction

The hope was that we could it easy to write such applications using ODF.

So why wouldn’t this be easy? In the end ODF is just ZIP and XML and every programming platform knows how to deal with these formats, right?

Yes, this is true. However there clearly are a lot of details to worry about. Although ZIP and XML are relatively simple technologies, defining exactly how ODF works requires over a thousand pages. This level of detail is necessary if you are writing a word processor or a spreadsheet. But you really don’t need to know ODF at this level in order to accomplish typical document automation tasks.

There have been several other attempts at writing toolkits in this space. Some, such as the ODF Toolkit Union’s ODFDOM project have aimed for a low-level, Java API, with a 1-to-1 correspondence with ODF’s elements and attributes. Others, like lpOD’s Python API have taken a higher-level view of ODF. You can make a good argument for either approach. Each has its advantages and disadvantages.

The advantage of the low-level API is that if want to manipulate an existing ODF document, which in general can contain any legal ODF markup, then you need an API that understands 100% of ODF. But in order to understand that API would require understanding the entire ODF standard. So that is too complicated for most application developers.

If you write a high level API, then it may be easy to use, but how can you then guarantee that it can losslessly manipulate an arbitrary ODF document?

I think the best approach might be a blended approach. Build a low-level API that does 100% of ODF, and then on top of that have a layer that provides higher-level functions that do the most-common tasks. This gives you the benefits of completeness and simplicity. This is the approach we have taken with the Simple Java API for ODF. It is built upon the schema-driven ODFDOM API, to give it a solid low-level foundation. And on top of that it adds high-level functions. How high? The aim is provide operations that are similar to what you as an end-user would have available in the UI, or what you as an application developer would have with VBA or UNO macros. So adding high level content, like tables or images. Search and replace operations. Cut and paste. Simple, but still powerful.

A Quick Example

As a quick illustration of the level of abstraction provided by the Simple Java API for ODF, let’s do some simple app. We want to load ODF documents, search for stock ticker symbols and add a hyperlink for each one to the company’s home page.

So, start with a document that looks like this:

We want to take that and find each instance of “FOO” and add a hyperlink to “http://www.foo.com” and so on. If you tried this operation with an ODF document directly, it could certainly be done. But it would require a good deal of familiarity with the ODF standard. But using the Simple API you can do this without touching XML directly.

Let’s see how this is done.


// basic Java core libraries that any Java developer knows about
import java.net.URL;
import java.io.File;

// Simple API classes for text documents, selections and text navigation
import org.odftoolkit.simple.TextDocument;
import org.odftoolkit.simple.text.search.TextSelection;
import org.odftoolkit.simple.text.search.TextNavigation;

public class Linkify
{
    public static void main(String[] args)
    {
        try
        {
            // load text document (ODT) from disk.
            // could also load from URL or stream
            TextDocument document=(TextDocument)TextDocument.loadDocument("foobar.odt");

            // initialize a search for "Foo".
            // we'll be adding regular expression support as well
            TextNavigation search = new TextNavigation("FOO", document);

            // iterate through the search results
            while (search.hasNext())
            {
                // for each match, add a hyperlink to it
                TextSelection item = (TextSelection) search.getCurrentItem();
                item.addHref(new URL("http://www.foo.com"));
            }

            // save the modified document back to a new file
            document.save(new File("foobar_out.odt"));
        }

        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

}

Run the code and you get a new document, with the hyperlinks added, like this:
Simple enough? I think so.

How to get involved

We really want your help with this API. This is not one of those faux-open source projects, where all the code is developed by one company. We want to have a real community around this project. So if you are at all interested in ODF and Java, I invite you to take a look:

Download the 0.2 release of the Simple Java API for ODF. The wiki also has important info on install pre-reqs.
Work through some of the cookbook to get an idea on how the API works.
Sign up and join the ODF Toolkit Union project.
Join the users mailing list and ask questions. Defect reports can go to our Bugzilla tracker.
If you want to contribute patches, more info on the wiki for how to access our repository.