The Announcement
The first public release of the new Simple Java API for ODF is now available for download. This API radically simplifies common document automation tasks, allowing you to perform tasks in a few lines of code that would require hundreds if you were manipulating the ODF XML directly.
The Simple API is part of the ODF Toolkit Union open source community and is available under the Apache 2.0 license. JavaDoc, demonstration code and a “Cookbook” are also available on the project’s website.
The Background
I first proposed an ODF Toolkit back in 2006, shortly after I got involved with ODF. It was clear then that one of the big advantages of ODF, compared to proprietary binary formats, is that ODF lent itself to manipulation using common, high level tools. I made a list of the top 2o document-based “patterns of use“, but the key ones are in the following areas:
- Mail merge style field replacement
- Combining documents fragments/document assembly
- Data-drive document generation
- Information extraction
The hope was that we could it easy to write such applications using ODF.
So why wouldn’t this be easy? In the end ODF is just ZIP and XML and every programming platform knows how to deal with these formats, right?
Yes, this is true. However there clearly are a lot of details to worry about. Although ZIP and XML are relatively simple technologies, defining exactly how ODF works requires over a thousand pages. This level of detail is necessary if you are writing a word processor or a spreadsheet. But you really don’t need to know ODF at this level in order to accomplish typical document automation tasks.
There have been several other attempts at writing toolkits in this space. Some, such as the ODF Toolkit Union’s ODFDOM project have aimed for a low-level, Java API, with a 1-to-1 correspondence with ODF’s elements and attributes. Others, like lpOD’s Python API have taken a higher-level view of ODF. You can make a good argument for either approach. Each has its advantages and disadvantages.
The advantage of the low-level API is that if want to manipulate an existing ODF document, which in general can contain any legal ODF markup, then you need an API that understands 100% of ODF. But in order to understand that API would require understanding the entire ODF standard. So that is too complicated for most application developers.
If you write a high level API, then it may be easy to use, but how can you then guarantee that it can losslessly manipulate an arbitrary ODF document?
I think the best approach might be a blended approach. Build a low-level API that does 100% of ODF, and then on top of that have a layer that provides higher-level functions that do the most-common tasks. This gives you the benefits of completeness and simplicity. This is the approach we have taken with the Simple Java API for ODF. It is built upon the schema-driven ODFDOM API, to give it a solid low-level foundation. And on top of that it adds high-level functions. How high? The aim is provide operations that are similar to what you as an end-user would have available in the UI, or what you as an application developer would have with VBA or UNO macros. So adding high level content, like tables or images. Search and replace operations. Cut and paste. Simple, but still powerful.
A Quick Example
As a quick illustration of the level of abstraction provided by the Simple Java API for ODF, let’s do some simple app. We want to load ODF documents, search for stock ticker symbols and add a hyperlink for each one to the company’s home page.
So, start with a document that looks like this:
We want to take that and find each instance of “FOO” and add a hyperlink to “http://www.foo.com” and so on. If you tried this operation with an ODF document directly, it could certainly be done. But it would require a good deal of familiarity with the ODF standard. But using the Simple API you can do this without touching XML directly.
Let’s see how this is done.
// basic Java core libraries that any Java developer knows about import java.net.URL; import java.io.File; // Simple API classes for text documents, selections and text navigation import org.odftoolkit.simple.TextDocument; import org.odftoolkit.simple.text.search.TextSelection; import org.odftoolkit.simple.text.search.TextNavigation; public class Linkify { public static void main(String[] args) { try { // load text document (ODT) from disk. // could also load from URL or stream TextDocument document=(TextDocument)TextDocument.loadDocument("foobar.odt"); // initialize a search for "Foo". // we'll be adding regular expression support as well TextNavigation search = new TextNavigation("FOO", document); // iterate through the search results while (search.hasNext()) { // for each match, add a hyperlink to it TextSelection item = (TextSelection) search.getCurrentItem(); item.addHref(new URL("http://www.foo.com")); } // save the modified document back to a new file document.save(new File("foobar_out.odt")); } catch (Exception e) { e.printStackTrace(); } } }
Run the code and you get a new document, with the hyperlinks added, like this:
Simple enough? I think so.
How to get involved
We really want your help with this API. This is not one of those faux-open source projects, where all the code is developed by one company. We want to have a real community around this project. So if you are at all interested in ODF and Java, I invite you to take a look:
- Download the 0.2 release of the Simple Java API for ODF. The wiki also has important info on install pre-reqs.
- Work through some of the cookbook to get an idea on how the API works.
- Sign up and join the ODF Toolkit Union project.
- Join the users mailing list and ask questions. Defect reports can go to our Bugzilla tracker.
- If you want to contribute patches, more info on the wiki for how to access our repository.
Leave a Reply