ODFDOM is an open source (Apache 2.0) Java library for reading, writing and modifying ODF documents. It runs standalone, not requiring OpenOffice.org or any other editor to be installed. It operates directly on the ODF document itself.
One of the things we’re focusing on in the next release of (the 0.9 release) is optimizing the performance, getting ODFDOM to read and write ODF documents as fast as possible, and with as low a memory footprint as possible. The aim is to make it optimal for concurrent use, say in a Java servlet.
Some of the the things we’re finding as we profile ODFDOM are worth sharing, since they are not specific to this library. They are tips and techniques that are applicable more broadly, potentially to all applications that work with ODF documents. I’ll do a series of posts on these ideas. Hopefully you will find them useful and maybe even can share your tricks as well.
The first thing I’ll point out concerns documents with many image resources, such as large presentation files with a lot of graphics. We found that writing these documents was rather slow. The problem was in how the images were stored in the ZIP archive. As you may know, ZIP allows a file to be compressed (most commonly using the DEFLATE algorithm). Most ZIP libraries will, by default, compress every file you add to the archive. However, for many common media types, like PNG and JPG images, the data has already been compressed, at the level of the image encoding. So if you have your ZIP library try to compress the images a second time, you will typically waste time with very little incremental savings in storage.
Most ZIP libraries have an alternative way to store files in their original, uncompressed form, a method called STORE. What we found in ODFDOM was that if we store images rather than compress them, the time needed to save our large presentation was reduced by 20%, while the size of the archive increased only 3%. So this was a good trade-off.
I think this technique would be applicable to other libraries and editors.