{"id":1323,"date":"2010-11-01T15:50:00","date_gmt":"2010-11-01T19:50:00","guid":{"rendered":"http:\/\/2d823b65bb.nxcli.io\/?p=1323"},"modified":"2010-11-06T10:57:59","modified_gmt":"2010-11-06T14:57:59","slug":"simple-java-api-odf","status":"publish","type":"post","link":"https:\/\/www.robweir.com\/blog\/2010\/11\/simple-java-api-odf.html","title":{"rendered":"Introducing: the Simple Java API for ODF"},"content":{"rendered":"<h3>The Announcement<\/h3>\n<p>The first public release of the new <a href=\"http:\/\/odftoolkit.org\/projects\/simple\/pages\/Home\">Simple Java API for ODF<\/a> is now available for <a href=\"http:\/\/odftoolkit.org\/projects\/simple\/downloads\">download<\/a>. This API radically simplifies common document automation tasks, allowing you to perform tasks in a few lines of code that would require hundreds if you were manipulating the ODF XML directly.<\/p>\n<p>The Simple API is part of the <a href=\"http:\/\/odftoolkit.org\/\">ODF Toolkit Union<\/a> open source community and is available under the Apache 2.0 license.\u00a0\u00a0 <a href=\"http:\/\/simple.odftoolkit.org\/javadoc\/index.html\">JavaDoc<\/a>, <a href=\"http:\/\/simple.odftoolkit.org\/sample\/index.html\">demonstration code<\/a> and a &#8220;<a href=\"http:\/\/simple.odftoolkit.org\/cookbook\/index.html\">Cookbook<\/a>&#8221; are also available on the project&#8217;s website.<\/p>\n<h3>The Background<\/h3>\n<p>I <a href=\"https:\/\/2d823b65bb.nxcli.io\/blog\/publications\/ODF_Toolkit_Proposal.pdf\">first proposed<\/a> an ODF Toolkit back in 2006, shortly after I got involved with ODF.\u00a0 It was clear then that one of the big advantages of ODF, compared to proprietary binary formats, is that ODF lent itself to manipulation using common, high level tools. I made a list of the top 2o document-based &#8220;<a href=\"https:\/\/2d823b65bb.nxcli.io\/blog\/2006\/09\/odf-twenty-patterns-of-use.html\">patterns of use<\/a>&#8220;, but the key ones are in the following areas:<\/p>\n<ul>\n<li>Mail merge style field replacement<\/li>\n<li>Combining documents fragments\/document assembly<\/li>\n<li>Data-drive document generation<\/li>\n<li>Information extraction<\/li>\n<\/ul>\n<p>The hope was that we could it easy to write such applications using ODF.<\/p>\n<p>So why wouldn&#8217;t this be easy?\u00a0\u00a0 In the end ODF is just ZIP and XML and every programming platform knows how to deal with these formats, right?<\/p>\n<p>Yes, this is true.\u00a0 However there clearly are a lot of details to worry about.\u00a0 Although ZIP and XML are relatively simple technologies, defining exactly how ODF works requires over a thousand pages.\u00a0 This level of detail is necessary if you are writing a word processor or a spreadsheet.\u00a0 But you really don&#8217;t need to know ODF at this level in order to accomplish typical document automation tasks.<\/p>\n<p>There have been several other attempts at writing toolkits in this space.\u00a0 Some, such as the <a href=\"http:\/\/odftoolkit.org\/\">ODF Toolkit Union&#8217;s<\/a> <a href=\"http:\/\/odftoolkit.org\/projects\/odfdom\/pages\/Home\">ODFDOM<\/a> project have aimed for a low-level, Java API, with a 1-to-1 correspondence with ODF&#8217;s elements and attributes.\u00a0 Others, like <a href=\"http:\/\/lpod-project.org\/odf-library\">lpOD&#8217;s Python API<\/a> have taken a higher-level view of ODF.\u00a0 You can make a good argument for either approach.\u00a0 Each has its advantages and disadvantages.<\/p>\n<p>The advantage of the low-level API is that if want to manipulate an existing ODF document, which in general can contain any legal ODF markup, then you need an API that understands 100% of ODF.\u00a0 But in order to understand that API would require understanding the entire ODF standard.\u00a0 So that is too complicated for most application developers.<\/p>\n<p>If you write a high level API, then it may be easy to use, but how can you then guarantee that it can losslessly manipulate an arbitrary ODF document?<\/p>\n<p>I think the best approach might be a blended approach.\u00a0 Build a low-level API that does 100% of ODF, and then on top of that have a layer that provides higher-level functions that do the most-common tasks.\u00a0 This gives you the benefits of completeness and simplicity.\u00a0 This is the approach we have taken with the <a href=\"http:\/\/odftoolkit.org\/projects\/simple\/pages\/Home\">Simple Java API for ODF<\/a>.\u00a0\u00a0 It is built upon the schema-driven ODFDOM API, to give it a solid low-level foundation.\u00a0 And on top of that it adds high-level functions.\u00a0 How high?\u00a0 The aim is provide operations that are similar to what you as an end-user would have available in the UI, or what you as an application developer would have with VBA or UNO macros.\u00a0 So adding high level content, like tables or images.\u00a0 Search and replace operations.\u00a0 Cut and paste.\u00a0\u00a0 Simple, but still powerful.<\/p>\n<h3>A Quick Example<\/h3>\n<p>As a quick illustration of the level of abstraction provided by the Simple Java API for ODF, let&#8217;s do some simple app.\u00a0 We want to load ODF documents, search for stock ticker symbols and add a hyperlink for each one to the company&#8217;s home page.<\/p>\n<p>So, start with a document that looks like this:<\/p>\n<p><a href=\"https:\/\/2d823b65bb.nxcli.io\/blog\/wp-content\/uploads\/2010\/11\/foo-corp.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1337\" title=\"Press Release\" src=\"https:\/\/2d823b65bb.nxcli.io\/blog\/wp-content\/uploads\/2010\/11\/foo-corp.jpg\" alt=\"\" width=\"725\" height=\"299\" srcset=\"https:\/\/www.robweir.com\/blog\/wp-content\/uploads\/2010\/11\/foo-corp.jpg 725w, https:\/\/www.robweir.com\/blog\/wp-content\/uploads\/2010\/11\/foo-corp-300x123.jpg 300w\" sizes=\"auto, (max-width: 725px) 100vw, 725px\" \/><\/a>We want to take that and find each instance of &#8220;FOO&#8221; and add a hyperlink to &#8220;http:\/\/www.foo.com&#8221; and so on.\u00a0 If you tried this operation with an ODF document directly, it could certainly be done.\u00a0 But it would require a good deal of familiarity with the ODF standard.\u00a0 But using the Simple API you can do this without touching XML directly.<\/p>\n<p>Let&#8217;s see how this is done.<\/p>\n<pre class=\"brush: java; auto-links: false; title: ; notranslate\" title=\"\">\r\n\r\n\/\/ basic Java core libraries that any Java developer knows about\r\nimport java.net.URL;\r\nimport java.io.File;\r\n\r\n\/\/ Simple API classes for text documents, selections and text navigation\r\nimport org.odftoolkit.simple.TextDocument;\r\nimport org.odftoolkit.simple.text.search.TextSelection;\r\nimport org.odftoolkit.simple.text.search.TextNavigation;\r\n\r\npublic class Linkify\r\n{\r\n    public static void main(String&#x5B;] args)\r\n    {\r\n        try\r\n        {\r\n            \/\/ load text document (ODT) from disk.\r\n            \/\/ could also load from URL or stream\r\n            TextDocument document=(TextDocument)TextDocument.loadDocument(&quot;foobar.odt&quot;);\r\n\r\n            \/\/ initialize a search for &quot;Foo&quot;.\r\n            \/\/ we'll be adding regular expression support as well\r\n            TextNavigation search = new TextNavigation(&quot;FOO&quot;, document);\r\n\r\n            \/\/ iterate through the search results\r\n            while (search.hasNext())\r\n            {\r\n                \/\/ for each match, add a hyperlink to it\r\n                TextSelection item = (TextSelection) search.getCurrentItem();\r\n                item.addHref(new URL(&quot;http:\/\/www.foo.com&quot;));\r\n            }\r\n\r\n            \/\/ save the modified document back to a new file\r\n            document.save(new File(&quot;foobar_out.odt&quot;));\r\n        }\r\n\r\n        catch (Exception e)\r\n        {\r\n            e.printStackTrace();\r\n        }\r\n    }\r\n\r\n}\r\n\r\n<\/pre>\n<p>Run the code and you get a new document, with the hyperlinks added, like this:<br \/>\n<a href=\"https:\/\/2d823b65bb.nxcli.io\/blog\/wp-content\/uploads\/2010\/11\/foobar.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1341\" title=\"Processed Press Release\" src=\"https:\/\/2d823b65bb.nxcli.io\/blog\/wp-content\/uploads\/2010\/11\/foobar.jpg\" alt=\"\" width=\"706\" height=\"322\" srcset=\"https:\/\/www.robweir.com\/blog\/wp-content\/uploads\/2010\/11\/foobar.jpg 706w, https:\/\/www.robweir.com\/blog\/wp-content\/uploads\/2010\/11\/foobar-300x136.jpg 300w\" sizes=\"auto, (max-width: 706px) 100vw, 706px\" \/><\/a>Simple enough?  I think so.<\/p>\n<h3>How to get involved<\/h3>\n<p>We really want your help with this API.\u00a0 This is not one of those faux-open source projects, where all the code is developed by one company.\u00a0 We want to have a real community around this project.\u00a0 So if you are at all interested in ODF and Java, I invite you to take a look:<\/p>\n<ol>\n<li><a href=\"http:\/\/odftoolkit.org\/projects\/simple\/downloads\">Download<\/a> the 0.2 release of the Simple Java API for ODF.\u00a0 The <a href=\"http:\/\/odftoolkit.org\/projects\/simple\/pages\/Home\">wiki<\/a> also has important info on install pre-reqs.<\/li>\n<li>Work through some of the <a href=\"http:\/\/simple.odftoolkit.org\/cookbook\/index.html\">cookbook<\/a> to get an idea on how the API works.<\/li>\n<li>Sign up and <a href=\"http:\/\/odftoolkit.org\/projects\/odftoolkit\/pages\/SignUp\">join the ODF Toolkit Union<\/a> project.<\/li>\n<li>Join the<a href=\"http:\/\/odftoolkit.org\/projects\/simple\/lists\"> users mailing list<\/a> and ask questions.\u00a0 Defect reports can go to our <a href=\"http:\/\/odftoolkit.org\/bugzilla\/buglist.cgi?product=simple&amp;order=Importance&amp;limit=25\">Bugzilla tracker<\/a>.<\/li>\n<li>If you want to contribute patches, more info on the wiki for how to access our repository.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>The Announcement The first public release of the new Simple Java API for ODF is now available for download. This API radically simplifies common document automation tasks, allowing you to perform tasks in a few lines of code that would require hundreds if you were manipulating the ODF XML directly. The Simple API is part [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[9],"tags":[],"class_list":{"0":"post-1323","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-odf","7":"entry"},"_links":{"self":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/1323","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/comments?post=1323"}],"version-history":[{"count":18,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/1323\/revisions"}],"predecessor-version":[{"id":1345,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/1323\/revisions\/1345"}],"wp:attachment":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/media?parent=1323"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/categories?post=1323"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/tags?post=1323"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}