ODFDOM is an open source (Apache 2.0) Java library for reading, writing and modifying ODF documents. It runs standalone, not requiring OpenOffice.org or any other editor to be installed. It operates directly on the ODF document itself.
One of the things we’re focusing on in the next release of (the 0.9 release) is optimizing the performance, getting ODFDOM to read and write ODF documents as fast as possible, and with as low a memory footprint as possible. The aim is to make it optimal for concurrent use, say in a Java servlet.
Some of the the things we’re finding as we profile ODFDOM are worth sharing, since they are not specific to this library. They are tips and techniques that are applicable more broadly, potentially to all applications that work with ODF documents. I’ll do a series of posts on these ideas. Hopefully you will find them useful and maybe even can share your tricks as well.
The first thing I’ll point out concerns documents with many image resources, such as large presentation files with a lot of graphics. We found that writing these documents was rather slow. The problem was in how the images were stored in the ZIP archive. As you may know, ZIP allows a file to be compressed (most commonly using the DEFLATE algorithm). Most ZIP libraries will, by default, compress every file you add to the archive. However, for many common media types, like PNG and JPG images, the data has already been compressed, at the level of the image encoding. So if you have your ZIP library try to compress the images a second time, you will typically waste time with very little incremental savings in storage.
Most ZIP libraries have an alternative way to store files in their original, uncompressed form, a method called STORE. What we found in ODFDOM was that if we store images rather than compress them, the time needed to save our large presentation was reduced by 20%, while the size of the archive increased only 3%. So this was a good trade-off.
I think this technique would be applicable to other libraries and editors.
I realize it is an outlier case, but the fact that ODF encryption requires compression of the before-encryption files leads to an odd case where the images have to be compressed before encryption and subsequent storing of the package with its encrypted content. Depending on what is done with the ODF document later, the DEFLATEd images may end up staying that way from then on. In the case of ODFDOM, it may depend on how parts that are not modified are carried forward into an updated version of the document.
I think signing of the document should work regardless, so long as signing is always against the unencrypted forms of the Zipped files, regardless of their being compressed and even encrypted in later operations.
I’m guessing that the overhead of encryption and decryption swamps any added cost of image compression and decompression.
> ZIP allows a file to be compressed (most commonly using the DEFLATE algorithm)
Does ODF make any restrictions with respect to the algorithms used? I couldn’t find anything related to that in the ODF 1.2 Part 3 Draft (Package specification). In terms of interoperability and long-term archiving it would make sense in my opinion to limit the algorithms to the most commonly used ones (e.g. DEFLATE and STORE), especially since many ZIP libraries only support DEFLATE.
Has this ever been discussed or is the algorithm maybe even specified?
@divo, this is an issue we are tracking for ODF 1.2. Current proposal is to only allow DEFLATE for compression.
http://tools.oasis-open.org/issues/browse/OFFICE-2532