No, this has nothing to do with getting discounted parking if you use ODF, though that is an intriguing idea…
Daniel Carrera (OpenDocument Fellowship and the OASIS ODF TC) has a new blog and with it comes news of a new ODF tool, an ODF Validator Service, written as part of the Fellowship’s ODF Tools project by Alex Hudson.
It is in the spirit of the W3C’s Markup Validation Service: upload a document and get an instant report of whether or not it is valid ODF, and if not, what problems were found. I tried a few documents and it seems to work well.
It would be interesting to see if something like this could be made into a flexible framework for scanning ODF documents, at various levels. Think of a SAX-like call-back parser but at multiple levels of detail. So the framework knows how to fully parse an ODF document and identify features at the Zip and XML level. Plugins to the framework can subscribe to various parse events. So, maybe a ZipListener interface that simply has methods onFile() and onDirectory(). Then a ManifestListener interface that allows you to subscribe to notifications of the data in the manifest. Then within a document, like a spreadsheet, you could have listeners at the structural and content level, so onWorksheet(), onCell(), or in a Wordprocessor document, onTable(), onImage(), etc.
A framework like this could allow you to make a range of applications that need to scan an ODF document and take some action on it.
- A validation service would operate at several levels, validating the structure of the Zip, the manifest as well as validating each of XML’s.
- You could also do a cross-platform checker, looking embedded images, and other media, OLE links, etc., and reporting on whether any of these have platform dependencies.
- An accessibility scanner would be able to fit into this framework as well.
- A full text indexer could work here.
- Any number of content scraping applications could work well here.
- If there is some query language interface, this could be useful from a test-generation perspective. If you have a large collection of ODF documents, a developer working on a feature can instantly bring up a set of test documents that can be used to test the code he just changed. Give me a list of word processor documents that have Arabic Bidi text which also have tables. Give me a list of spreadsheets that use pie charts with more than 10 slices.
- With the metadata framework coming in ODF 1.2, there will be even more interesting uses of such a framework.
The benefit of the framework is the reduction in code required to get directly to the info in the ODF document you want, without having to master the ODF specification or writing a lot of parsing code. Think of it as a framework for easy multi-level information extraction from ODF documents.
Change Log
4/11/2007 — Removed parenthetical comment about the need for a privacy policy, since one has now been added to the Validator page.
orcmid says
Interesting thoughts.
The validator looks like a nice project and the ODF Tools will be welcome. It looks like these can only get richer, as you surmise.
D. says
Not a bad idea. I know I wrote a C# library for parsing in ODF files, mainly so I could find all the paragraphs of a certain style (“Command”), process them and substituted the results of the internal program with that paragraph. It ended up making a nice little toolchain for my documents, more so when I could build arbitrary tables and styles based on a database.
That and a little tool for converting spreadsheet sheets into tab-delm files.
One reason I love ODF is that it is such a nice clean interface, fairly easy to understand, and doesn’t really have excess cruft of history.
The Wraith says
Why would anyone except for an ODF implementor validate ODF files on line ?
An implementor seem more likely likely to directly use the ODF XML schema files for validating his created documents.
What does this tool add to ordinary schema validation ?
Rob says
You could ask the same question about the W3C’s HTML/XHTML/CSS validators. But they seem to get a lot of use. You don’t need to be an HTML editor vendor to have a need for validating HTML.
If you look around the site I linked to you can see that this is more than just schema validation. For example, ODF specifies other constraints, such as referential integrity constraints between the manifest and content XML’s, that are not expressible in RELAX NG.
Daniel says
Hi Rob. I just saw your post. I talked to Cyclone3 and we just put together a privacy policy. Please drop me an email if we missed anything.
http://opendocumentfellowship.org/validator