<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-11236681</atom:id><lastBuildDate>Wed, 14 May 2008 02:32:26 +0000</lastBuildDate><title>An Antic Disposition</title><description/><link>http://www.robweir.com/blog/</link><managingEditor>noreply@blogger.com (Rob)</managingEditor><generator>Blogger</generator><openSearch:totalResults>167</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-4912814743730565869</guid><pubDate>Tue, 13 May 2008 19:30:00 +0000</pubDate><atom:updated>2008-05-13T17:28:09.084-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>OOXML</category><category domain='http://www.blogger.com/atom/ns#'>ODF</category><category domain='http://www.blogger.com/atom/ns#'>Performance</category><title>Spreadsheet file format performance</title><description>I've been doing some performance timings of file format support, comparing MS Office and OpenOffice.  Most of the results are as expected, but some are surprising, and one in particular is quite disappointing.&lt;br /&gt;&lt;br /&gt;But first, a little details of my setup.  All timings, done by stopwatch, were from Office 2003 and OpenOffice 2.4.0 running on Windows XP, with all current service packs and patches.  The machine is a Lenova T60p, dual-core Intel 2.16 Ghz and 2 GB of RAM.  I took all the standard precautions -- disk was defragmented, and test files were confirmed as defragmented using &lt;a href="http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx"&gt;contig&lt;/a&gt;.  No other applications were running and background tasks were all shut down.&lt;br /&gt;&lt;br /&gt;For test files, I went back to an old favorite, George Ou's (at the time with ZDNet) monster 50MB XLS file from his &lt;a href="http://blogs.zdnet.com/Ou/?p=119"&gt;series of tests&lt;/a&gt; back in 2005.  This file, although very large, is very simple.  There are no formulas, indeed no formatting or styles.  It is just text and numbers, treating a spreadsheet like a giant data table.  So tests of this file will emphasize the raw throughput of the applications.  Real world spreadsheets will typically be worse than this due to additional overhead from process styles, formulas, etc.&lt;br /&gt;&lt;br /&gt;A test of a single file is not really that interesting.  We want to see trends, see patterns.  So I made a set of variations on George's original file, converting it into ODF, XLS and OOXML formats, as well as making scaled down versions of it.  In total I made 12 different sized subsets of the original file, ranging down to a 437KB version, and created each file in all three formats.  I then tested how long it took to load each file in each of the applications.  In the case of MS Office, I installed the current versions of the translators for those formats, the Compatibility Pack for OOXML, and the ODF Add-in for the ODF support.&lt;br /&gt;&lt;br /&gt;I find it convenient to report numbers per 100,000 spreadsheet cells.  You could equally well use the original XLS spreadsheet size, or the number of rows of data, or any other correlated variable as the ordinate, but values per 100K cells is simple for anyone to understand.&lt;br /&gt;&lt;br /&gt;I'll spare you all the pretty picture.  If you want to make some, here is the &lt;a href="http://www.robweir.com/blog/attachments/FileFormatPerf.csv"&gt;raw data&lt;/a&gt; (CSV format).    But I will give some summary observations.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For document sizes, the results are as follows:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Binary XLS format = 1,503 KB per 100K cells&lt;/li&gt;&lt;li&gt;OOXML format = 491 KB per 100K cells&lt;/li&gt;&lt;li&gt;ODF format = 117 KB per 100K cells&lt;/li&gt;&lt;/ul&gt;So the XML formats are far smaller than the legacy binary format.  This is due to the added Zip compression that both XML formats use.  Also, note that the ODF files are significantly smaller than the OOXML files, less than 1/4 the size on average.  Upon further examination, the XML document representing the ODF content is larger than the corresponding XML in OOXML, as expected, due to its use of longer, more descriptive markup tags.   However the ODF XML compresses far better than the OOXML version, enough to overcome its greater verbosity and result in files smaller than OOXML.   The compression ratio (original/zipped) for ODF's content.xml is 87, whereas the compression ratio for OOXML's sheet1.xml is only 12.  We could just mumble something about entropy and walk away, but I think this area could bear further investigation.&lt;br /&gt;&lt;br /&gt;Any ideas?&lt;br /&gt;&lt;br /&gt;For load time, the times for processing the binary XLS files were:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Microsoft Office 2003 = 0.03 seconds per 100K cells&lt;/li&gt;&lt;li&gt;OpenOffice 2.4.0  = 0.4 seconds per 100K cells&lt;/li&gt;&lt;/ul&gt;Not too surprising.  These binary formats are optimized for the guts of MS Office.  We would expect them to load faster in their native application.&lt;br /&gt;&lt;br /&gt;So what about the new XML formats?  There has been recent talk about the "&lt;a href="http://www.codinghorror.com/blog/archives/001114.html"&gt;Angle Bracket Tax&lt;/a&gt;" for XML formats.  How bad is it?&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Microsoft Office 2003 with OOXML = 1.5 seconds per 100K cells&lt;/li&gt;&lt;li&gt;OpenOffice 2.4.0 with ODF = 2.7 seconds per 100K cells&lt;/li&gt;&lt;/ul&gt;For typical sized documents, you probably will not notice the difference.  However with the largest documents, like the 16-page, 3-million cells monster sheet, the OOXML document took 40 seconds to load in Office, the ODF sheet took 90 seconds to load in OpenOffice, whereas the XLS binary took less than 2 seconds to load in MS Office.&lt;br /&gt;&lt;br /&gt;OK.  So what are we missing.  Ah, yes, ODF format in MS Office, using their ODF Add-in.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Microsoft Office 2003 with ODF, using the ODF Add-in = 74.6 seconds per 100K cells&lt;/li&gt;&lt;/ul&gt;Yup.  You read that right.  To put this in perspective, let's look at a single test file, a 600K cells file, as we load it in the various formats and editors:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Microsoft Office 2003 in XLS format = 0.75 seconds&lt;/li&gt;&lt;li&gt;OpenOffice 2.4.0 in XLS format =  3.03 seconds&lt;/li&gt;&lt;li&gt;Microsoft Office 2003 in OOXML format = 8.28 seconds&lt;/li&gt;&lt;li&gt;OpenOffice 2.4.0 in ODF format = 14.09 seconds&lt;/li&gt;&lt;li&gt;Microsoft Office 2003 in ODF format = 515.60 seconds&lt;/li&gt;&lt;/ul&gt;Can someone explain to me why Microsoft Office needs almost 10 minutes to load an ODF file that OpenOffice can load in 14 seconds?&lt;br /&gt;&lt;br /&gt;(I was not able to test files larger than this using the ODF Add-in since they all crashed .)&lt;br /&gt;&lt;br /&gt;(Update:  Since it is the question everyone wants to know, the beta OpenOffice 3.0 opens the OOXML version of that file in 49.4 seconds, over 10x faster than MS Office loads the ODF document.)&lt;br /&gt;&lt;br /&gt;This is one reason why I think file format translation is a poor engineering approach to interoperability.  When OpenOffice wants to read an legacy XLS file, it does not approach the problem by translating the XLS into an ODF document and then loading the ODF file.  Instead they simply load the XLS file, via a file filter, into the internal memory model of OpenOffice.&lt;br /&gt;&lt;br /&gt;What is a file filter?  It is like 1/2 of a translator.  Instead of translating from one disk format to another disk format, it simply loads the disk format and maps it into an application-specific memory model that the application logic can operate directly on.  This is far more efficient than translation.  This is the untold truth that the layperson does not know.  But this is how everyone does it.  That is how we support formats in SmartSuite.  That is how OpenOffice does it.  And that is how MS Office does it for the file formats they care about.    In fact, that is the way that Novell is now doing it now, since they discovered that the Microsoft approach is doomed to performance hell.&lt;br /&gt;&lt;br /&gt;So it is with some amusement that I watch Microsoft and others propose translation as a solution to interoperability, creating reports about translation, even a proposal for a new work item in JTC1/SC34 concerning file format translation, when the single concrete attempt at translation is such an abysmal failure.  It may look great on paper, but it is an engineering disaster.  What customers need is direct, internal support for ODF in MS Office, via native code, in a file filter, not a translator that takes 10 minutes to load a file.&lt;br /&gt;&lt;br /&gt;The astute engineer will agree with the above, but will also feel some discomfort at the numbers.  There is more here than can be explained simply by the use of translators versus import filters.  That choice might explain a 2x difference in performance.  A particularly poor implementation might explain a 5x difference.  But none of this explains why MS Office is almost 40x slower in processing ODF files.   Being that much slower is hard to do accidentally. Other forces must be at play.&lt;br /&gt;&lt;br /&gt;Any ideas?</description><link>http://www.robweir.com/blog/2008/05/spreadsheet-file-format-performance.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-254536812231836970</guid><pubDate>Thu, 08 May 2008 17:30:00 +0000</pubDate><atom:updated>2008-05-08T13:43:21.360-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Berries</category><title>Berry Good, Berry Bad</title><description>It has been an interesting Spring here in Westford, weather-wise.  April dipped below freezing on the 3rd, 8th, 9th, 15th and 16th.   Then we got a warm spell, a week of days that reached 75 °F (23 °C) and even one day that reached 87.4 °F (30.8 °C)  (April 23rd).  Then it struck, on May 1st, an overnight low of 28.7 °F (-1.8 °C).&lt;br /&gt;&lt;br /&gt;The vulnerability, when a late frost like this occurs, is in bud development.  If the plant, by warm sunny days, has been tricked into bud development, and then a freeze occurs, the bud will be injured or killed.  Strawberries are particularly prone to this problem.&lt;br /&gt;&lt;br /&gt;Because of interactions of thermal inversions at the ground, humidity levels, etc., a simple temperate reading is not an accurate indicator of whether damage actually occurred.  For example, if humidity is high, the temperate can dip to freezing, but in the act of freezing water vapor (creating frost) energy is released (latent heat of fusion, as the chemists call it).  So you have a few degrees of tolerance if humidity is high, if you have a fog, etc.  In fact, commercial strawberry growers will handle this problem by running sprinklers when a freeze threatens, to increase the amount of water around the plants available to freeze, as a buffer to protect the plants.  Every degree helps.&lt;br /&gt;&lt;br /&gt;But I wasn't so lucky.  The extent of damage was not clear until the strawberry plants started blooming this week.  Here's what I am seeing.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.robweir.com/blog/images/berry_good.png" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Above is an example of a normal, healthy strawberry blossom.  You see the ring of stames, the male organs of the flower, each with a filiament stalk tipped with an anther containing the pollen.  In the center is the receptacle with the many carpels, which are the female side of the equation.&lt;br /&gt;&lt;br /&gt;But in the picture below, we see a blossom from my garden that shows injury.  Although the plant is sound, and it did flower, the carpels are dead.  This blossom will not yield a berry.&lt;br /&gt;&lt;br /&gt;From the looks of it, 40-50% of buds are damaged in this way.  So no strawberry wine this year.  I'll only have enough for fresh eating and ice cream.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.robweir.com/blog/images/berry_bad.png" /&gt;</description><link>http://www.robweir.com/blog/2008/05/berry-good-berry-bad.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-8188271329911077254</guid><pubDate>Wed, 07 May 2008 22:45:00 +0000</pubDate><atom:updated>2008-05-07T18:46:31.786-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>ODF</category><title>Achieving the impossible</title><description>&lt;img src="http://www.robweir.com/blog/images/valid.jpg" /&gt;&lt;br /&gt;&lt;br /&gt;Unadulterated copy of James Clark's Relax NG validator &lt;a href="http://www.thaiopensource.com/relaxng/jing.html"&gt;jing&lt;/a&gt;.  Unadulterated copy of Kohsuke Kawaguchi's Sun Multi-Schema Validator &lt;a href="https://msv.dev.java.net/"&gt;msv&lt;/a&gt;.  Unadulterated copy of the &lt;a href="http://www.oasis-open.org/committees/download.php/12571/OpenDocument-schema-v1.0-os.rng"&gt;ODF 1.0&lt;/a&gt; Relax NG schema.  Unadulterated copy of the &lt;a href="http://www.oasis-open.org/committees/download.php/19275/OpenDocument-v1.0ed2-cs1.odt"&gt;ODF 1.0 Standard&lt;/a&gt;, in ODF format.&lt;br /&gt;&lt;br /&gt;No errors from either validator.  &lt;br /&gt;&lt;br /&gt;msv is so good as to tell us "the document is valid".   But jing indicates success with only silence.  So will I.</description><link>http://www.robweir.com/blog/2008/05/achieving-impossible.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-4323369540440163138</guid><pubDate>Tue, 06 May 2008 19:15:00 +0000</pubDate><atom:updated>2008-05-06T18:22:38.639-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Standards</category><title>Standards Words</title><description>&lt;h3&gt;Introduction&lt;/h3&gt;There are several words, more widely used than understood, that recur frequently when discussing standards.  Specification and standardization requires us precisely to describe technology in such a way that practitioners in that field can achieve the goals set out in the standard.   But this precision is only perfectly intelligible  to those who share the same code words.  What follows is a handful of the more important ones, what they mean, and how they are unintentionally confused or intentionally misused.  You are at a distinct disadvantage when reading (or writing) a news article, a blog post, or evaluating an argument if you do not know the correct meaning of the following words.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Standard&lt;/h3&gt;Take the definition from ISO/IEC Guide 2:2004, definition 3.2:&lt;blockquote&gt;[A] document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context.&lt;br /&gt;&lt;br /&gt;NOTE Standards should be based on the consolidated results of science, technology and experience, and aimed at the promotion of optimum community benefits.&lt;/blockquote&gt;&lt;br /&gt;So, it is a document, a written description, not an embodiment in the form of a product, that is standardized.  Its aims are the "achievement of optimum degree of order" and "promotion of optimum community benefits", and it is achieved through consensus and consolidation.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;international standard&lt;/h3&gt;According to ISO/IEC Guide 2:2004, definition 3.2.1.1:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;[A] standard that is adopted by an international standardizing/standards organization and made available to the public.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;International Standard&lt;/h3&gt;&lt;br /&gt;&lt;blockquote&gt;[An] international standard where the international standards organization is ISO or IEC&lt;/blockquote&gt;&lt;br /&gt;Note the distinction.  With capital letters only ISO or IEC standards apply.  With lowercase, other standards are included.  This is a bit self-serving.  ISO and IEC Standards are the only International Standards, because ISO says so.  Sorry ITU, sorry CEN, sorry W3C.&lt;br /&gt;&lt;br /&gt;So think of "International Standards" as a controlled mark of ISO, like "parmigiano reggiano" is a controlled mark of the Northern Italian Cheese Consorzio.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Normative&lt;/h3&gt;The normative parts of a standard are those which set out the scope and provisions of the standard.  See ISO Directives, Part 2, section 3.8.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Provisions&lt;/h3&gt;The provisions of a  standard consist of:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Requirements that must be met for conformance&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Recommendations&lt;/li&gt;&lt;li&gt;Statements of permissible, possible or capable actions&lt;/li&gt;&lt;/ol&gt;See ISO Directives, Part 2, section 3.12.&lt;br /&gt;&lt;br /&gt;Note that standards have specific words which denote and distinguish requirements, recommendations and capabilities.  Different standards organizations have different vocabulary for this, so a W3C Recommendation, an IETF RFC and an ISO Standard may have different ways of stating the same provision.  For ISO Standards, the conventions are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;"shall" and "shall not" are the normal terms for expressing requirements.&lt;/li&gt;&lt;li&gt;"should" and "should not" are the normal terms for expressing recommendations.&lt;/li&gt;&lt;li&gt;"may" and "need not" are the normal terms for expressing permission.&lt;/li&gt;&lt;li&gt;"can" and "cannot" are the normal terms for expressing possibility and capability.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;This is necessary because of the extreme ambiguity of the English language in the area of modality.  Consider the following sentences, using the word "must":&lt;br /&gt;&lt;ul&gt;&lt;li&gt;(On hearing the doorbell ring), "Oh, that must be the mailman!" [expressing likelihood]&lt;br /&gt;&lt;/li&gt;&lt;li&gt;(To a misbehaving child) "You must obey your mother" [expressing obligation]&lt;/li&gt;&lt;/ul&gt;Or the following exchange with a teenage daughter:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Teen:  "I shall return by 11pm" [simple future]&lt;/li&gt;&lt;li&gt;Parent: "No, you shall return by 10pm" [expressing a command]&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;We can be loose and still be understood, in context, in normal conversation, but in standards work we try to be precise and uniform in the use of our control vocabulary.&lt;br /&gt;&lt;h3&gt;Conformance&lt;/h3&gt;This simply is a question of whether something meets the requirements of the standard.  However, for many standards, there are multiple levels, perhaps even multiple classes of conformance. So you need to be very specific about what you are saying.&lt;br /&gt;&lt;br /&gt;For example, you should not ask "Does Excel 2007 conform to OOXML?"  You should ask "Is Excel 2007 a conforming transitional class SpreadsheetML Producer?"  If you count it all up, OOXML probably has at least 18 distinct conformance classes, by various combinations of applications, documents, readers/writers and transitional/strict conformance classes.&lt;br /&gt;&lt;br /&gt;Not in particular that conformance does not mean that an application implements the entire standard.&lt;br /&gt;&lt;br /&gt;[My definition above is not very satisfactory.  Anyone have something better?  Is there an ISO definition of conformance?]&lt;br /&gt;&lt;h3&gt;Compliance&lt;/h3&gt;This is not a typical standards term.  The more typical term is "conformance".  Best to avoid it unless you are talking in regulatory or legal context.  See ISO Directives, Part 2, section 6.6.1.1:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;A document does not in itself impose any obligation upon anyone to follow it. However, such an obligation may be imposed, for example, by legislation or by a contract. In order to be able to claim compliance with a document, the user needs to be able to identify the requirements he/she is obliged to satisfy. The user also needs to be able to distinguish these requirements from other provisions where there is a certain freedom of choice.&lt;/blockquote&gt;&lt;br /&gt;&lt;h3&gt;Validity&lt;/h3&gt;This is an XML term, referring to the relationship between an XML document instance (an XML file) and a schema (the definition of the syntax of the markup language).  Generally, an XML document instance is valid if it adheres to the constraints defined in the schema.   The precise definition of validity will depend on the schema definition language used.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I'd welcome any suggestions for other words or definitions that should be included here.</description><link>http://www.robweir.com/blog/2007/05/standards-words.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-4730659405797523991</guid><pubDate>Tue, 06 May 2008 01:30:00 +0000</pubDate><atom:updated>2008-05-05T23:31:54.161-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>XML</category><category domain='http://www.blogger.com/atom/ns#'>ODF</category><title>The Challenge</title><description>&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;br /&gt;&amp;lt;office:document-content&lt;br /&gt; xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"&lt;br /&gt; xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"&lt;br /&gt; office:version="1.0"&amp;gt;&lt;br /&gt; &amp;lt;office:body&amp;gt;&lt;br /&gt;     &amp;lt;office:text&amp;gt;&lt;br /&gt;         &amp;lt;text:p&amp;gt;Dear Alex Brown. Please prove that I am invalid ODF 1.0 (ISO 26300:2006). I do not think that I am. In fact I think that your statement that there are no valid ISO ODF documents in the world, and that there cannot be, is a brash, irresponsible and indefensible piece of bombast that you should retract.&amp;lt;/text:p&amp;gt;&lt;br /&gt;         &amp;lt;text:p&amp;gt;(Please note that this document contains no ID, IDREF or IDREFS attributes. Nor does it contain custom content.)&amp;lt;/text:p&amp;gt;        &lt;br /&gt;     &amp;lt;/office:text&amp;gt;&lt;br /&gt; &amp;lt;/office:body&amp;gt;&lt;br /&gt;&amp;lt;/office:document-content&amp;gt;</description><link>http://www.robweir.com/blog/2008/05/challenge.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-2310532601429454270</guid><pubDate>Sun, 04 May 2008 17:30:00 +0000</pubDate><atom:updated>2008-05-04T13:39:23.945-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>OOXML</category><title>Release the OOXML final DIS text now !</title><description>The &lt;a href="http://www.jtc1sc34.org/repository/0856rev.pdf"&gt;&lt;cite&gt;JTC1 Directives&lt;/cite&gt;&lt;/a&gt; [pdf] are quite clear on this point.  After a Ballot Resolution Meeting (BRM), if the text is approved, the edited, final version of the text is to be distributed to NB's within 1 month.  This requirement is in the Fast Track part of &lt;cite&gt;JTC1 Directives&lt;/cite&gt;, specifically in 13.12:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;13.12 The time period for post ballot activities by the respective responsible parties shall be as follows:&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;In not more than one month after the ballot resolution group meeting the SC Secretariat shall distribute the final report of the meeting and final DIS text in case of acceptance.&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;The OOXML BRM ended on February 29th.  One month after February 29th, if my course work in scientific computing does not fail me, is... let's see, carry the 3, multiply, convert to sidereal time, account for proper nutation of the solar mean, subtract the perihelion distance at first point of Aries, OK.  Got it.  Simple.  One month later is approximately March 29th +/- 3 days.&lt;br /&gt;&lt;br /&gt;So the SC34 Secretariat should have distributed the "final DIS text" by March 29th, or at the very least, when the final ballot results on OOXML were known a few days later.&lt;br /&gt;&lt;br /&gt;But that didn't happen.  Nothing.  Silence.   What is the hang up?   I note that when NB's said that the Fast Track schedule did not give sufficient time to review OOXML, the response from ISO/IEC was "There is nothing we can do.  The Directives only permit 5 months".  And when NB's protested at the arbitrary 5 day length of the OOXML BRM,  the response was similarly dismissive.  But when Microsoft needs more time to edit OOXML, well that appears to be something entirely different.  "Directives, Schmerectives.  You don't worry yourself about no stinkin' Directives.  Take whatever time you need, Sir."&lt;br /&gt;&lt;br /&gt;It makes you wonder who ISO/IEC bureaucracy  is working for?  The rights and prerogatives of NB's?  Or of large corporations?  Almost every decision they made in the OOXML processing was to the the detriment of NB prerogatives.&lt;br /&gt;&lt;br /&gt;This delay has practical implications as well.  Consider the following:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;We are currently approaching a two month period where NB's can lodge an appeal against OOXML.   Ordinarily, one of the grounds for appeal would be if the Project Editor did not faithfully carry out the editing instructions approved at the BRM.  For example, if he failed to make approved changes, made changes that were not authorized, or introduced new errors when applying the approved changes.  But with no final DIS text, the NB's are unable to make any appeals on those grounds.  By delaying the release of the final DIS text, JTC1 is preventing NB's from exercising their rights.&lt;/li&gt;&lt;li&gt;Law suits, such as the &lt;a href="http://www.theregister.co.uk/2008/05/01/bsi_ooxml_vote_high_court/"&gt;recent one&lt;/a&gt; in the UK, are alleging process irregularities, including (if I read it correctly) that BSI approved OOXML without seeing the final text.  I imagine that having the final DIS text in hand and being able to point to particular flaws in that text that should have justified disapproval would bolster their case.  But if JTC1 withholds the text, then they cannot make that point as effectively.&lt;/li&gt;&lt;li&gt;There are obvious anti-competitive effects at play here.  Microsoft has the final DIS version of the ISO/IEC 29500:2008 standard, and by JTC1 delaying release to NB's, Microsoft is able to have 2+ extra months, free of competition, to produce a fix pack to bring their products in line with the final standard, while other competitors like Sun or Corel are left behind.  So much for transparency. So much for open standards.  How can this can considered open if some competitors are given a significant time and access advantage?&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Note that I'm not talking about the publication of the IS here.  I'm talking about the requirements of 13.12 and the release of the final DIS text.  Obviously ITTF will have a lot of work to do prepping OOXML for publication.  For ODF it took 6 months.  For OOXML I would expect it to take at least that long.  But that does not prevent adhearance to the Directives, in particular the requirement to distribute the final DIS text.&lt;br /&gt;&lt;br /&gt;JTC1/SC34, noticing the delay in the release of this text, adopted the &lt;a href="http://www.itscj.ipsj.or.jp/sc34/open/1025.htm"&gt;following Resolution&lt;/a&gt; at their Plenary in early April:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Resolution 8: Distribution of Final text of DIS 29500&lt;br /&gt;&lt;br /&gt;SC 34 requests the ITTF and the SC34 secretariat to distribute the already received final text of DIS 29500 to the SC 34 members in accordance with JTC 1 directives section 13.12 as soon as possible, but not later than May 1st 2008. Access to this document is important for the success of various ISO/IEC 29500 maintenance activities.&lt;/blockquote&gt;&lt;br /&gt;This indicates that the final DIS text had already been received by SC34 (but not distributed) as of that date (April 9th).&lt;br /&gt;&lt;br /&gt;Well, here we are, May 4th, over two months since the final DIS text was due, and past the date requested by the SC34 Plenary (who by they way have no authority to extend the deadline required by &lt;cite&gt;JTC1 Directives&lt;/cite&gt;, but that is another story).  We have nothing.&lt;br /&gt;&lt;br /&gt;So, I'll make my own personal appeal.  JTC1 has the text.  The Directives are clear.  The delay is unnecessary and harmful in the ways I outlined above.  Release the final DIS text now.  Not next month.  Not next week.  Release it now.</description><link>http://www.robweir.com/blog/2008/05/release-ooxml-final-dis-text-now.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-5018506032670087425</guid><pubDate>Fri, 02 May 2008 16:15:00 +0000</pubDate><atom:updated>2008-05-04T16:30:06.564-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>ODF</category><title>ODF Validation for Dummies</title><description>[Updated 4 May 2008, with additional rebuttal at the end]&lt;br /&gt;&lt;br /&gt;Alex Brown has a problem.  He can't figure out how to validate ODF documents.  Unfortunately, when he couldn't figure it out, he didn't ask the OASIS ODF TC for help, which would have been the normal thing to do.   Indeed, the ODF TC passed a &lt;a href="http://lists.oasis-open.org/archives/office/200702/msg00076.html"&gt;resolution&lt;/a&gt; back in February 2007 that said, in part:&lt;blockquote&gt;That the ODF TC welcomes any questions from ISO/IEC JTC1/SC34 and&lt;br /&gt;member NB's regarding OpenDocument Format, the functionality it&lt;br /&gt;describes, the planned evolution of this standard, and its relationship&lt;br /&gt;to other work on the technical agenda of JTC1/SC34. Questions and&lt;br /&gt;comments can be directed to the TC chair and secretary whose email&lt;br /&gt;addresses are given at&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office"&gt;http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;or through the comments facility at&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office"&gt;http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office&lt;/a&gt;&lt;/blockquote&gt;&lt;br /&gt;So it is rather uncollegial of Alex to refuse such an open, transparent way of getting his questions answered.  But Alex didn't avail himself of that avenue.   He just assumed if he couldn't figure out how to validate ODF then it simply couldn't be done, and that ODF was to blame.  This is presumptuous.  Does he think that in the three years since ODF 1.0 became a standard, that no one has tried to validate a document?&lt;br /&gt;&lt;br /&gt;Alex is so sure of himself that he &lt;a href="http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=f0384bed-808b-49a8-8887-ea7cde5caace"&gt;publicly exults&lt;/a&gt; on the claimed significance of his findings:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;ul&gt;&lt;li&gt;For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF.&lt;/li&gt;&lt;li&gt;Even if the schema is fixed, we can see that OpenOffice.org 2.4.0 does not produce valid XML documents. This is to be expected and is a mirror-case of what was found for MS Office 2007: while MS Office has not caught up with the ISO standard, OpenOffice has rather bypassed it (it aims at its consortium standard, just as MS Office does).&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;I think you agree that these are bold pronouncements, especially coming from someone so prominent in SC34, the Convenor of the ill-fated OOXML BRM, someone who is currently arguing that SC34 should own the maintenance of OOXML and ODF, indeed someone who would be well served if he could show that all consortia standards are junk, and that only SC34 (and he himself) could make them good.&lt;br /&gt;&lt;br /&gt;Of course, I've been known to pontificate as well.  There is nothing necessarily wrong with that.  The difference here is that Alex Brown is totally wrong.&lt;br /&gt;&lt;br /&gt;But let's see if we can help show Alex, or &lt;a href="http://osrin.net/2008/05/02/openofficeorg-240-and-is26300-conformance/"&gt;anyone&lt;/a&gt; &lt;a href="http://en.wikipedia.org/w/index.php?title=OpenDocument&amp;amp;diff=209431062&amp;amp;oldid=prev"&gt;else&lt;/a&gt; &lt;a href="http://blogs.msdn.com/dmahugh/archive/2008/04/30/odf-conformance-tests.aspx"&gt;similarly&lt;/a&gt; &lt;a href="http://idippedut.dk/post/2008/04/Conformance-of-ODF-documents.aspx"&gt;confused&lt;/a&gt;, the correct way to validate an ODF document.&lt;br /&gt;&lt;br /&gt;First start with an ODF document.  When Alex tested OOXML, he used the Ecma-376 OOXML specification.  Let's do the analogous test and validate the ODF 1.0 text.  You can download it from the OASIS ODF &lt;a href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office"&gt;web site&lt;/a&gt;.  You'll want &lt;a href="http://www.oasis-open.org/committees/download.php/19275/OpenDocument-v1.0ed2-cs1.odt"&gt;this version&lt;/a&gt; of the text, ODF 1.0 (second edition), which is the source document for the ISO version of ODF.&lt;br /&gt;&lt;br /&gt;You'll also want to download the Relax NG schema files for OASIS ODF 1.0, which you can download in two pieces:  the &lt;a href="http://www.oasis-open.org/committees/download.php/12571/OpenDocument-schema-v1.0-os.rng"&gt;main schema&lt;/a&gt;, and the &lt;a href="http://www.oasis-open.org/committees/download.php/12570/OpenDocument-manifest-schema-v1.0-os.rng"&gt;manifest schema&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Next you'll need to get a Relax NG validator.  Alex recommends James Clark's &lt;a href="http://www.thaiopensource.com/relaxng/jing.html"&gt;jing&lt;/a&gt;, so we'll use that.  I downloaded &lt;a href="http://www.thaiopensource.com/download/jing-20030619.zip"&gt;jing-20030619.zip&lt;/a&gt; the main distribution for use with the Java Runtime Environment.  Unzip that to a directory and we're almost there.&lt;br /&gt;&lt;br /&gt;Since jing operates on XML files and knows nothing about the Zip package structure of an ODF file, you'll need to extract the XML contents of the ODF file.  There are many ways to do this.  My preference, on Windows, is to associate WinZip with the ODF file extensions (ODT, ODS and ODP) so I can right-click on these files unzip them.  When you unzip you will have the following XML files, along with directories for images files and other non-XML resources you can ignore:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;content.xml&lt;/li&gt;&lt;li&gt;styles.xml&lt;/li&gt;&lt;li&gt;meta.xml&lt;/li&gt;&lt;li&gt;settings.xml&lt;/li&gt;&lt;li&gt;META-INF/manifest.xml&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;So now we're ready to validate!  Let's start with content.xml.  The command line for me was:&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;java -jar c:/jing/bin/jing.jar OpenDocument-schema-v1.0-os.rng content.xml&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;(Your command may vary, depending on where you put jing, the ODF schema files and the unzipped ODF files)&lt;br /&gt;&lt;br /&gt;The result is a whole slew of error messages:&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;C:\temp\odf\OpenDocument-schema-v1.0-os.rng:17658:18: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"&lt;br /&gt;C:\temp\odf\OpenDocument-schema-v1.0-os.rng:10294:22: error: conflicting ID-types for attribute "targetElement" from namespace "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0" of element "command" from namespace "urn:oasis:names:tc:opendocument:xmlns:animation:1.0"&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Oh no!  Emergency, emergency, everyone to get from street!&lt;br /&gt;&lt;br /&gt;I wonder if this is one of the things that tripped Alex up?  Take a deep breath.  These in fact are not Relax NG (ISO/IEC 19757-2) errors at all, but errors generated by jing's default validation of a different set of constraints, defined in the &lt;a href="http://www.oasis-open.org/committees/relax-ng/compatibility-20011203.html"&gt;Relax NG DTD Compatibility&lt;/a&gt; specification which has the status of a Committee Specification in OASIS.  It is not part of ISO/IEC 19757-2.&lt;br /&gt;&lt;br /&gt;Relax NG DTD Compatibility provides three extensions to Relax NG:  default attribute values, ID/IDREF constaints and a documentation element.  The Relax NG DTD Compatibility specification is quite clear in section 2 that "Conformance is defined separately for each feature.  A conformant implementation can support any combination of features."  And in fact, ODF 1.0, in section 1.2 does just that: "The schema language used within this specification is Relax-NG (see [RNG]). The attribute default value feature specified in [RNG-Compat] is used to provide attribute default values".    &lt;br /&gt;&lt;br /&gt;It is best to simple disable the checking of Relax NG DTD Compatibility constraints by using the documented "-i" flag in jing.  If you want to validate ID/IDREF cross-references, then you'll need to do that in application code, and not using jing in Relax NG DTD Compatibility mode.  Note that jing was not complaining about any actual ID/IDREF problem in the ODF document. &lt;br /&gt;&lt;br /&gt;So, false alarm.  You can walk safely on the streets now.&lt;br /&gt;&lt;br /&gt;(That said, if we can make some simple changes to the ODF schemas that will allow it to work better with the default settings of jing, or other popular tools, then I'm certainly in favor of that.  Alex's proposed changes to the schema are reasonable and should be considered.)&lt;br /&gt;&lt;br /&gt;So, let's repeat the validation with the -i flag:&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng content.xml&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Zero errors, zero warnings.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng styles.xml&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Zero errors, zero warnings.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng meta.xml&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Zero errors, zero warnings.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;java -jar c:/jing/bin/jing.jar -i OpenDocument-schema-v1.0-os.rng settings.xml&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Zero errors, zero warnings.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;java -jar c:/jing/bin/jing.jar -i OpenDocument-manifest-schema-v1.0-os.rng META-INF/manifest.xml&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Zero errors, zero warnings.&lt;br /&gt;&lt;br /&gt;So, there you have it, an example that shows that there is at least one document in the universe that is valid to the ODF 1.0 schema, disproving Alex's statement that "there are no XML documents in existence which are valid to ISO ODF."&lt;br /&gt;&lt;br /&gt;The directions are complete and should allow anyone to validate the ODF 1.0 specification, or any other ODF 1.0 document.  Now that we have the basics down, let's work on some more advanced topics.&lt;br /&gt;&lt;br /&gt;First, the reader should note that there are two versions of the ODF schema, the original 1.0 from 2005, and the updated 1.1 from 2007.   (This is also a third version underway, ODF 1.2, but that needn't concern us here.)&lt;br /&gt;&lt;br /&gt;An application, when it creates an ODF document, indicates which version of the ODF standard it is targeting.  You can find this indication if you look at the &lt;tt&gt;office:version&lt;/tt&gt; attribute on the root element of any ODF XML file.  The only values I would expect to see in use today would be "1.0" and "1.1". Eventually we'll also see "1.2".&lt;br /&gt;&lt;br /&gt;It is important to use the appropriate version of the ODF schema to validate a particular document.   Our goal, as we evolve ODF,  is that an application that knows only about ODF 1.0 should be able to adapt and "degrade gracefully" when given an ODF 1.1 document, by ignoring the features it does not understand.   But an application written to understand ODF 1.1 should be able to fully understand ODF 1.0 documents without any additional accommodation.&lt;br /&gt;&lt;br /&gt;Put differently, from the document perspective, a document that conforms to ODF 1.0 should also conform to ODF 1.1.  But the reverse direction is not true.&lt;br /&gt;&lt;br /&gt;To accomplish this, as we evolve ODF, within the 1.x family of revisions, we try to limit ourselves to changes that widen the schema constraints, by adding new optional elements, or new attribute values, or expanding the range of values permitted.   Constraint changes that are logically narrowing, like removing elements, making optional elements mandatory, or reducing the range of allowed values, would break this kind of document compatibility.&lt;br /&gt;&lt;br /&gt;Now of course, at some point we may want to make bolder changes to the schema, but this would be in a major release, like a 2.0 version.  But within the ODF 1.x family we want this kind of compatibility.&lt;br /&gt;&lt;br /&gt;The net of this is, an ODF 1.1 document should only be expected to be valid to the ODF 1.1 schema, but an ODF 1.0 document should be valid to the ODF 1.0 and the ODF 1.1 schemas.&lt;br /&gt;&lt;br /&gt;That's enough theory!  Let's take a look now at the test that Alex actually ran.  It is a rather curious, strangely biased kind of test, but the bad thinking is interesting enough to devote some time to examine in some detail.&lt;br /&gt;&lt;br /&gt;When he &lt;a href="http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=3e2202cd-59a3-4356-8f30-b8eb79735e1a"&gt;earlier tested OOXML&lt;/a&gt;, Alex used the OOXML standard itself, a text on which Microsoft engineers had lavished many person-years of attention for the past 18 months, and he validated it with the current version of the OOXML schema.  That is pretty much the best case, testing a document that has never been out of Microsoft's sight for 18 months and testing it with the current version of the schema.  I would expect that this document would have been a regular test case for Microsoft internally, and that its validity has been repeatedly and exhaustively tested over the past 18 months.  I know that I personally tested it when Ecma-376 was first released, since it was the only significant OOXML document around.  So, essentially Alex gave OOXML the softest of all soft pitches.&lt;br /&gt;&lt;br /&gt;I think Microsoft's response, that the validity errors detected by Alex are due to changes made to the schema at the BRM, is a reasonable and accurate explanation.  The real story on OOXML standardization is not how many changes were made that were incompatible with Office 2007, but how few.  It appears that very few changes, perhaps only one, will be required to make Office 2007's output be valid OOXML.&lt;br /&gt;&lt;br /&gt;So when testing ODF, what did Alex do?  Did he use the ODF 1.0 specification as a test case, a document that the OASIS TC might have had the opportunity to give a similar level of attention to?  No, he did not, although that would have validated perfectly, as I've demonstrated above.  Instead, Alex uses the OOXML specification, a document which by his own testing is not valid OOXML, then converts it into the proprietary .DOC binary format, then translates that binary format into ODF and then tries to validate the results with the ODF 1.0 schema (i.e., the wrong version of the ODF schema since OpenOffice 2.4.0's output is clearly declared as ODF 1.1), and then applies a non-applicable, non-standard DTD Compatibility constraint test during the Relax NG validation.&lt;br /&gt;&lt;br /&gt;Does anyone see something else wrong with this testing methodology?&lt;br /&gt;&lt;br /&gt;Aside from the obvious bias of using  an input document that Microsoft has spent 18 months perfecting, and using the wrong schemas and validator settings, there is another, more subtle problem.&lt;br /&gt;&lt;br /&gt;Alex's test of OOXML and ODF are testing entirely different things.  With OOXML, he took a version N (Ecma-376) OOXML document and tried to validate it with a version N+1 (ISO/IEC 29500) version of the OOXML schema.&lt;br /&gt;&lt;br /&gt;But what he did with ODF was take a version N+1 (ODF 1.1) document and tried to validate it with an version N (ODF 1.0) of the ODF schema.&lt;br /&gt;&lt;br /&gt;These are entirely different operations.  One test is testing the backwards compatibility of the schema, the other is testing the backwards compatibility of document instances.  It takes no genius to figure out that if ODF 1.1 adds new elements, then an ODF 1.1 document instance will not validate with the ODF 1.0 schema.  We don't ordinarily expect backwardly compatible validity of document instances.  Again, Alex's tests are biased in OOXML's favor, giving ODF a much more difficult, even impossible task, compared the the versions ran for OOXML.&lt;br /&gt;&lt;br /&gt;If we want to compare apples to apples, it is quite easy to perform the equivalent test with ODF.  I gave it a try, taking a version N document (the ODF 1.0 standard itself, per above) and validated it with the version N+1 schema (ODF 1.1 in this case).  It worked perfectly.  No warnings, no errors.&lt;br /&gt;&lt;br /&gt;In any case, in his backwards test Alex reports 7,525 errors, "mostly of the same type (use of an undeclared &lt;tt&gt;soft-page-break&lt;/tt&gt; element)" when validating the OOXML text with ODF 1.0 schema.  Indeed, all but 39 of these errors are reports of &lt;tt&gt;soft-page-break&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Soft page breaks are a new feature introduced in ODF 1.1.  It has two primary advantages for accessibility.  First it allows easier collaboration between people using different technologies to read a document.  Not all documents are deeply structured, with formal divisions like section 3.2.1, etc.  Most business documents are loosely structured, and collaboration occurs by referring to "2nd paragraph on page 23" or "the bottom of page 18".  But when using different assistive technologies, from larger fonts, to braille, to audio renderings, the page breaks (if the assistive technology even has the concept of a page break) are usually located differently from the page breaks in the original authoring tool.    This makes collaboration difficult.  So, ODF 1.1 added the ability for applications to write out "soft" page breaks, indicating where the page breaks occurred when the original source document was saved.&lt;br /&gt;&lt;br /&gt;Although this feature was added for accessibility reasons, like &lt;a href="http://en.wikipedia.org/wiki/Curb_cuts"&gt;curb cuts&lt;/a&gt;, its likely future applications are more general.  We will all benefit.  For example, a convertor for translating from ODF to HTML would ordinarily only be able to calculate the original page breaks by undertaking complex layout calculations.  But with soft page breaks recorded,  even a simple XSLT script can use this information to insert indications of page breaks, or to generate accurate page numbering, etc. Although the addition of this feature hinders Alex's idiosyncratic attempt to validate ODF 1.1 documents with the ODF 1.0 schema, I think the fact that this feature helps blind and visually impaired users, and generally improves collaboration makes it a fair trade-off.&lt;br /&gt;&lt;br /&gt;Wouldn't you agree?&lt;br /&gt;&lt;br /&gt;That leaves 39 validation errors in Alex's test.  12 of them are reports of invalid values in an &lt;tt&gt;xlink:href&lt;/tt&gt; attribute value.  This appears to be an error in the original DOCX file.  Garbage In, Garbage Out.  For example, in one case the original document has HYPERLINK field that contains a link to content in Microsoft's proprietary CHM format (Compiled HTML).  The link provided in the original document does not match the syntax rules required for an XML Schema &lt;tt&gt;anyURI&lt;/tt&gt; (the URL ends with "##" rather than "#")    Maybe it is correct for markup like this, with non-standard, non-interoperable URI's, to give validation errors.  This is not the first time that OOXML has been found &lt;a href="http://www.robweir.com/blog/2008/03/ooxmls-out-of-control-characters.html"&gt;polluting XML&lt;/a&gt; with proprietary extensions.  But realize that OpenOffice 2.4.0 did not create this error.  OpenOffice is just passing the error along, as Office 2007 saved it.  It is interesting to note that this error was not caught in MS Office, and indeed is undetectable with OOXML's lax schema.  But the error was caught with the ODF schema.  This is a good thing, yes?  It might be a good idea for OpenOffice to add an optional validation step after importing Microsoft Office documents, to filter out such data pollution.&lt;br /&gt;&lt;br /&gt;For the remaining validation errors, they are 27 instances of &lt;tt&gt;style:with-tab&lt;/tt&gt;.  Honestly, I have no explanation for this.  This attribute does not exist in ODF 1.0 or ODF 1.1.  That it is written out appears to be a bug in OpenOffice.  Maybe someone there can tell us why the story is on this?  But I don't see this problem in all documents, or even most documents.&lt;br /&gt;&lt;br /&gt;For fun I tried processing this OOXML document another way.  Instead of the multi-hop OOXML-to-DOC-to-ODF conversion Alex did, why not go directly from OOXML to ODF in one step, using the convertor that Microsoft/CleverAge created?  This should be much cleaner, since it doesn't have all the legacy code or messiness of the binary formats or legacy application code.  It is just a mapping from one markup to another markup, written from scratch.  Getting the output to be valid should be trivial.&lt;br /&gt;&lt;br /&gt;So I &lt;a href="http://odf-converter.sourceforge.net/download.html#command-line"&gt;download&lt;/a&gt; the "OpenXML/ODF Translator Command Line Tools" from SourceForge.  According to their web page, this tool targets ODF 1.0, so we'll be validating against the ODF 1.0 schemas.&lt;br /&gt;&lt;br /&gt;This tool is very easy to use once you have the .NET prerequisites installed.  The command line was:&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;odfconvertor /I "Office Open XML Part 4 - Markup Language Reference.docx"&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;The convertor then chugs along for a long, long, long time.  I mean a long time.  The conversion from OOXML to ODF eventually finished, after 11 hours, 10 minutes and 41 seconds!  And this was on a  Thinkpad T60p with dual-core Intel 2.16Ghz processor and 2.0 GB of RAM.&lt;br /&gt;&lt;br /&gt;I then rang jing, using the validation command lines from above.  It reported 376 validation errors, which fell into several categories:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;text:s element not allowed in this context&lt;/li&gt;&lt;li&gt;bad value for text:style:name&lt;/li&gt;&lt;li&gt;bad value for text:outline-level&lt;/li&gt;&lt;li&gt;bad value for svg:x&lt;br /&gt;&lt;/li&gt;&lt;li&gt;bad value for svg:y&lt;/li&gt;&lt;li&gt;element tetx:tracked-changes not allowed in this context&lt;/li&gt;&lt;li&gt;"text not allowed here"&lt;/li&gt;&lt;/ul&gt;In any case, not a lot of errors, but a handful of errors repeated.  But it is surprising to see that this single-purpose tool, written from scratch, had more validation errors in it than OpenOffice 2.4.0 does.&lt;br /&gt;&lt;br /&gt;In the end we should put this in perspective.   Can OpenOffice produce valid ODF documents?  Yes, it can, and I have given an example.  Can OpenOffice produce invalid documents?  Yes, of course.  For example when it writes out a .DOC binary file, it is not even well-formed XML.  And we've seen one example, where via a conversion from OOXML, it wrote out an ODF 1.1 document that failed validation.  But conformance for an application does not require that it is incapable of writing out an invalid document.  Conformance requires that it is capable of writing out a valid document.  And of course, success for an ODF implementation requires that its conformance to the standard is sufficient to deliver on the promises of the standard, for interoperability.&lt;br /&gt;&lt;br /&gt;It is interesting to recall the study that &lt;a href="http://elsewhat.com/thesis/"&gt;Dagfinn Parnas&lt;/a&gt; did a few years ago.  He analyzed 2.5 million web pages.  He found that only 0.7% of them were valid markup.  Depending on how you write the headlines, this is either an alarming statement on the low formal quality of web content, or a reassuring thought on the robustness of well-designed applications and systems.  Certainly the web seems to have thrived in spite of the fact that almost every web page is in error according to the appropriate web standards.  In fact I promise you that the page you are reading now is not valid, and neither is &lt;a href="http://validator.w3.org/check?uri=http%3A%2F%2Fwww.griffinbrown.co.uk%2Fblog%2FPermaLink.aspx%3Fguid%3Df0384bed-808b-49a8-8887-ea7cde5caace&amp;amp;charset=%28detect+automatically%29&amp;amp;doctype=Inline&amp;amp;group=0"&gt;Alex Brown's&lt;/a&gt;, nor &lt;a href="http://validator.w3.org/check?uri=http%3A%2F%2Fwww.itscj.ipsj.or.jp%2Fsc34%2F&amp;amp;charset=%28detect+automatically%29&amp;amp;doctype=Inline&amp;amp;group=0"&gt;SC34's&lt;/a&gt;, nor &lt;a href="http://validator.w3.org/check?uri=http%3A%2F%2Fisotc.iso.org%2Flivelink%2Flivelink%2Ffetch%2F2000%2F2122%2F327993%2Fcustomview.html%3Ffunc%3Dll%26objId%3D327993&amp;amp;charset=%28detect+automatically%29&amp;amp;doctype=Inline&amp;amp;group=0"&gt;JTC1's&lt;/a&gt;, nor &lt;a href="http://validator.w3.org/check?uri=http%3A%2F%2Fwww.ecma-international.org%2F&amp;amp;charset=%28detect+automatically%29&amp;amp;doctype=Inline&amp;amp;group=0"&gt;Ecma's&lt;/a&gt;, nor &lt;a href="http://validator.w3.org/check?uri=http%3A%2F%2Fwww.iso.org%2Fiso%2Fhome.htm&amp;amp;charset=%28detect+automatically%29&amp;amp;doctype=Inline&amp;amp;group=0"&gt;ISO's&lt;/a&gt;, nor the &lt;a href="http://validator.w3.org/check?uri=http%3A%2F%2Fwww.iec.ch%2F&amp;amp;charset=%28detect+automatically%29&amp;amp;doctype=Inline&amp;amp;group=0"&gt;IEC's&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So I suggest that ODF has a far better validation record than HTML and the web have, and that is an encouraging statement.   In any case, Alex Brown's dire pronouncements on ODF validity have been weighed in the balance and found wanting.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;4 May 2008&lt;br /&gt;&lt;br /&gt;Alex has responded on his blog with "&lt;a href="http://www.griffinbrown.co.uk/blog/PermaLink.aspx?guid=ace3b1c6-7ce8-49c7-8485-1ff8c34b7038"&gt;ODF validation for cognoscneti&lt;/a&gt;".  He deals purely with the  ID/IDREF/IDREFS questions in XML.  He does not justify his biased and faulty testing methodology, not does he reiterate his bold claims that there are no valid ODF 1.0 documents in existence.&lt;br /&gt;&lt;br /&gt;Since Alex's blog does not seem to be allowing me to comment, I'll put here what I would have put there.  I'll be brief because I have other fish to fry today.&lt;br /&gt;&lt;br /&gt;Alex, no one doubts that ID/IDREF/IDREFS constraints must be respected by valid ODF document instances.  I never suggested otherwise.  But what I do state is that this is not a concern of a Relax NG validator.  You can read James Clark saying the same thing in his 2001 "&lt;a href="http://relaxng.org/xsd-20010907.html"&gt;Guidelines for using W3C XML Schema Datatypes with RELAX NG&lt;/a&gt;", which says in part:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;The semantics defined by [W3C XML Schema Datatypes] for the ID, IDREF and IDREFS datatypes are purely lexical and do not include the cross-reference semantics of the corresponding [XML 1.0] datatypes. The cross-reference semantics of these datatypes in XML Schema comes from XML Schema Part 1. Furthermore, the [XML 1.0] cross-reference semantics of these datatypes do not fit into the RELAX NG model of what a datatype is. Therefore, RELAX NG validation will only validate the lexical aspects of these datatypes as defined in [W3C XML Schema Datatypes]. &lt;/blockquote&gt;&lt;br /&gt;Validation of ID/IDREF/IDREFS cross-reference semantics is not the job of Relax NG, and you are incorrect to suggest otherwise.  Your logic is also deficient when you take my statement of that fact and derive the false statement that I believe that ID/IDREF semantics do not apply to ODF.  One does not follow from the other.&lt;br /&gt;&lt;br /&gt;You know, as much as anyone, that conformance is a complex topic.  One does not ordinarily expect, except in trivial XML formats, that the complete set of conformance constraints will be expressed in the schema.  Typically a multi-layered approach is used, with some syntax and structural constraints expressed in XML Schema or Relax NG, some business constraints in Schematron, and maybe even some deeper semantic constraints that are expressed only in the text of the standard and can only be tested by application logic.&lt;br /&gt;&lt;br /&gt;For example, a document that defines a cryptographic algorithm might need to store a prime number.  The schema might define this as an integer.  The fact that the schema does not state or guarantee that it is a prime number is not the fault of the schema.  And the inability of a Relax NG validator to test primality is not a defect in Relax NG.  The primality test would simply need to be carried out at another level, with application logic.  But the requirement for primality in document instances can still be a conformance requirement and it is still testable, albeit with some computational effort, in application logic.&lt;br /&gt;&lt;br /&gt;I believe that is the source of your confusion.  The initial errors you saw when running jing with the Relax NG DTD Compatibility flag enabled were not errors in the ODF document instances.  What you saw was jing reporting that it could not apply the Relax NG DTD Compatibility ID/IDREF/IDREFS constraint checks using the ODF 1.0 schema.  That in no way means that the constraints defined in XML 1.0 are not required on ODF document instances.  It simply indicates that you would need to verify these constraints using means other than Relax NG DTD Compatibility.&lt;br /&gt;&lt;br /&gt;So I wonder, have you actually found ODF document instances, say written from OpenOffice 2.4.0, which have ID/IDREF/IDREFS usage which violates the constraints expressed in ODF 1.0?&lt;br /&gt;&lt;br /&gt;Finally, in your professional judgment, do you maintain that this is a accurate statement:  "For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF."</description><link>http://www.robweir.com/blog/2008/05/odf-validation-for-dummies.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-6576696771147761498</guid><pubDate>Wed, 30 Apr 2008 15:14:00 +0000</pubDate><atom:updated>2008-04-30T13:59:50.565-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Standards</category><category domain='http://www.blogger.com/atom/ns#'>Microsoft</category><title>Embrace the Reality and Logic of Choice</title><description>Another neo-colonialist &lt;a href="http://pr.euractiv.com/index.php?q=node/2465"&gt;press release&lt;/a&gt; from Microsoft's CompTIA lobbying arm, this time inveighing against South Africa's adoption of ODF as a national standard.  One way to point out the absurdity of their logic is to replace the reference to ODF with references to any other useful standard that a government might adopt, like electrical standards.&lt;br /&gt;&lt;br /&gt;When we do this, we end up with the following.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;South Africa Electrical Current Adoption Outdated&lt;/h2&gt;&lt;br /&gt;&lt;br /&gt;South Africa’s recent adoption of the 230V/50Hz residential electrical standard represents a tact that will blunt innovation, much needed for their developing economy. The policy choice – which actually reduces electrical current choice – runs contrary to worldwide policy trends, where multiple electrical standards rule, thus threatening to separate South Africa from the wealth creating abilities of the global electrical industry.&lt;br /&gt;&lt;br /&gt;For MonPrevAss, the Monopoly Preservation Association, the overall concern for the global electrical industry is to ensure that lawmakers adopt flexible policies and set policy targets rather than deciding on fixed rules, technologies and different national standards to achieve these targets. Such rigid approaches pull the global electrical market apart rather than getting markets to work together and boost innovation for consumers and taxpayers. “The adoption sends a negative signal to a highly innovative sector” says I.M. Atool, MonPrevAss's Group Director, Public Policy EMEA.&lt;br /&gt;&lt;br /&gt;The “South African Bureau of Standards” (SABS) approved the 230V/50Hz residential electrical standard on Friday 18 April as an official national standard. This adoption, if implemented, will reduce choice, decrease the benefits of open competition and thwart innovation. The irony here is that South Africa is moving in a direction which stands in stark relief to the reality of the highly dynamic market, with some 40 different electrical current conventions available today.&lt;br /&gt;&lt;br /&gt;“Multiple co-existing electrical standards as opposed to only one standard should be favoured in the interest of users. The markets are the most efficient in creating electrical standards and it should stay within the exclusive hands of the market”, I.M. Atool explains.&lt;br /&gt;&lt;br /&gt;In light of the recent ISO/IEC adoption of the Microsoft 240V/55Hz electrical standard, the South African decision will not lead to improvements in the electrical sector. MonPrevAss urges Governments to allow consumers and users to decide which electrical standards are best. We fear that the choice of just one electrical standard runs the risk of being outdated before it is even implemented, as well as being prohibitively costly to public budgets and taxpayers.&lt;br /&gt;&lt;br /&gt;Governments should not restrict themselves to working with one electrical standard, and should urge legislators to refrain from any kind of mandatory regulation and discriminatory interventions in the market. The global electrical industry recommends Governments to embrace the reality and logic of choice and to devote their energies to ensuring interoperability through this choice.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;Of course,  this is just a rehash of an old logical fallacy, related to the old "&lt;a href="http://www.robweir.com/blog/2007/01/broken-windows-and-ghost-of-keynes.html"&gt;Broken Windows&lt;/a&gt;" fallacy.  It is like saying heart disease is a good thing because you have such a wide choice of therapies to treat it.  We would all agree that it is far preferable to be healthy and have a wide choice of activities that you want to do, rather than a wide choice of solutions to a problem that you never asked for and don't want.&lt;br /&gt;&lt;br /&gt;Consumers don't want a bag of adapters to convert between different formats and protocols.  That is giving consumers a choice in a solution to a interoperability problem they didn't ask for and they don't want.  Consumers want a &lt;a href="http://www.robweir.com/blog/2007/02/fiat-lux.html"&gt;choice of goods and services&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Observe the recent standards war with Blu-ray and HD DVD.  Ask yourself:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Did consumers want a choice in formats, or did they want a wider choice in players and high definition movies?&lt;/li&gt;&lt;li&gt;Did movie studios want a choice in formats and either the uncertainty over choosing the winner, or the expense of supporting both formats?  Or did they really just want a single format that would allow them to reach all consumers?&lt;/li&gt;&lt;li&gt;Did the uncertainty around the existence of two competing high definition formats help or hurt the adoption of high definition technologies in general?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Did consumers who make the early choice to  go with HD DVD, say Microsoft XBox owners, benefit from having this choice?&lt;/li&gt;&lt;/ol&gt;If every private individual, and every private business has the right to adopt technology standards according to their needs, why should governments be denied that same right?  Why should they be forced to take the only certain losing side of every standards war -- implementing all standards indiscriminately -- a choice that no rational business owner would make?&lt;br /&gt;&lt;br /&gt;How many spreadsheet formats does Microsoft use internally for running their business on?  Why should governments be denied choice in the same field that Microsoft itself exerts its right to chose?</description><link>http://www.robweir.com/blog/2008/04/embrace-reality-and-logic-of-choice.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-608868443427550312</guid><pubDate>Fri, 18 Apr 2008 04:15:00 +0000</pubDate><atom:updated>2008-04-22T18:23:43.545-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Standards</category><category domain='http://www.blogger.com/atom/ns#'>OOXML</category><category domain='http://www.blogger.com/atom/ns#'>ISO</category><title>Sinclair's Syndrome</title><description>A curious &lt;a href="http://www.iso.org/iso/pressrelease/faqs_isoiec29500.htm"&gt;FAQ&lt;/a&gt; put up by an unnamed ISO staffer on MS-OOXML.  Question #1 expresses concerns about Fast Tracking a 6,000 page specification, a concern which a large number of NB's also expressed during the DIS process.  Rather than deal honestly with this question, the ISO FAQ says:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;The number of pages of a document is not a criterion cited in the &lt;em&gt;JTC 1 Directives&lt;/em&gt; for refusal. It should be noted that it is not unusual for IT standards to run to several hundred, or even several thousand pages.&lt;/blockquote&gt;&lt;br /&gt;Now certainly there are standards that are several pages long.  For example, Microsoft likes to bring up the example of ISO 14496, MPEG 4, at over 4,000 pages in length.  But that wasn't a Fast Track.  And as &lt;a href="http://lehors.wordpress.com/2008/02/22/can-anyone-be-more-disingenuous/"&gt;Arnaud Lehors&lt;/a&gt; reminded us earlier, MPEG 4 was standardized in 17 parts over 6 years.&lt;br /&gt;&lt;br /&gt;So any answer in the FAQ which attempts to consider what is usual and what is unusual must take account of past practice JTC1 Fast Track submissions.  That, after all, was the question the FAQ purports to address.&lt;br /&gt;&lt;br /&gt;Ecma claims (PowerPoint presentation &lt;a href="http://www.ecma-international.org/activities/General/presentingecma.ppt"&gt;here&lt;/a&gt;) that there have been around 300 Fast Tracked standards since 1987 and Ecma has done around 80% of them.  So looking at Ecma Fast Tracks is a reasonable sample.  Luckily Ecma has &lt;a href="http://www.ecma-international.org/publications/standards/Standard.htm"&gt;posted all of their standards&lt;/a&gt;, from 1991 at least, in a nice table that allows us to examine this question more closely.   Since we're only concerned with JTC1 Fast Tracks, not ISO Fast Tracks or standards that received no approval beyond Ecma, we should look at only those which have ISO/IEC designations.  "ISO/IEC" indicates that the standard was approved by JTC1.&lt;br /&gt;&lt;br /&gt;So where did things stand on the eve of Microsoft's submission of OOXML to Ecma?&lt;br /&gt;&lt;br /&gt;At that point there had been 187 JTC1 Fast Tracks from Ecma since 1991, with basic descriptive statistics as follows:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;mean = 103 pages&lt;/li&gt;&lt;li&gt;median = 82 pages&lt;/li&gt;&lt;li&gt;min = 12 pages&lt;br /&gt;&lt;/li&gt;&lt;li&gt;max = 767 pages&lt;/li&gt;&lt;li&gt;standard deviation = 102 pages&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;A histogram of the page lengths looks like this:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.robweir.com/blog/images/ecma-hist.png" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;So the ISO statement that "it is not unusual for IT standards to run to several hundred, or even several thousand pages" does not seem to ring true in the case of JTC1 Fast Tracks.  A good question to ask anyone who says otherwise is, "In the time since JTC1 was founded, how many JTC1 Fast Tracks have been submitted greater than 1,000 pages in length".  Let me know if you get a straight answer.&lt;br /&gt;&lt;br /&gt;Let's look at one more chart.  This shows the length of Ecma Fast Tracks over time, from the 28-page Ecma-6 in 1991 to the 6,045 page Ecma-376 in 2006.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.robweir.com/blog/images/Ecma-2.png" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Let's consider the question of usual and unusual again, the question that ISO is trying to inform the public on.  Do you see anything unusual in the above chart?  Take a few minutes.  It is a little tricky to spot at first, but with some study you will see that one of the standards plotted in the above chart is atypical.  Keep looking for it.  Focus on the center of the chart, let your eyes relax, clear your mind of extraneous thoughts.&lt;br /&gt;&lt;br /&gt;If you don't see it after 10 minutes or so, don't feel bad.  Some people and even whole companies are not capable of seeing this anomaly.  As best as I can tell it is a novel cognitive disorder caused by taking money from Microsoft.  I call it "Sinclair's Syndrome" after Upton Sinclair who gave an early description of the condition,  writing in 1935: "It is difficult to get a man to understand something when his salary depends upon his not understanding it."&lt;br /&gt;&lt;br /&gt;To put it in more approachable terms, observe that Ecma-376, OOXML, at 6,045 pages in length, was 58 standard deviations above the mean for Ecma Fast Tracks.  Consider also that the average adult American male is 5′ 9″ (175 cm) tall, with a standard deviation of 3″ (8 cm).  For a man to be as tall, relative to the average height, as OOXML is to the average Fast Track, he would need to be 20′ 3″ (6.2 m) tall !&lt;br /&gt;&lt;br /&gt;For ISO, in a public relations pitch, to blithely suggest that several thousand page Fast Tracks are "not unusual" shows an audacious disregard for the truth and a lack of respect for a public that is looking for ISO to correct its errors, not blow smoke at them in a revisionist attempt to portray the DIS 29500 approval process as normal, acceptable or even legitimate.    We should expect better from ISO and we should express disappointment in them when they let us down in our reasonable expectations of honesty.  We don't expect this from Ecma.  We don't expect this from Microsoft.  But we should expect this from ISO.</description><link>http://www.robweir.com/blog/2008/04/sinclairs-syndrome.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-6938555814477330990</guid><pubDate>Wed, 16 Apr 2008 18:00:00 +0000</pubDate><atom:updated>2008-04-17T11:09:23.124-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>ODF</category><title>Suggesting ODF Enhancements</title><description>There is &lt;a href="http://blogs.sun.com/GullFOSS/entry/odf_enhancements_for_openoffice_org1"&gt;a good post&lt;/a&gt; by Mathias Bauer on Sun Hamburg's &lt;a href="http://blogs.sun.com/GullFOSS/"&gt;GullFOSS blog&lt;/a&gt;.  He deals with the practical importance of OASIS's "&lt;a href="http://www.oasis-open.org/who/intellectualproperty.php#appendixa"&gt;Feedback License&lt;/a&gt;" that governs any public feedback OASIS receives from non-TC members.&lt;br /&gt;&lt;br /&gt;The ODF TC receives ideas for new features from many places.  Many of the ideas come from our TC members themselves, where we have representation from most of the major ODF vendors, from open source projects, interest groups, as well as from individual contributors.&lt;br /&gt;&lt;br /&gt;Other ideas come from other vendors or open source projects, from organizations that the TC has a liaison relationship with (like ISO/IEC JTC1/SC34), or individual members of the public.&lt;br /&gt;&lt;br /&gt;Contributions from OASIS TC members are already covered by the &lt;a href="http://www.oasis-open.org/who/intellectualproperty.php"&gt;OASIS IPR Policy.&lt;/a&gt;   The TC member who contributes written proposals to the TC is obliged from the time of contribution.  And other TC members are obliged if they have been TC members for at least 60 days and remain a member 7 days after approval of any Committee Draft.  You can see the participation status of TC members &lt;a href="http://www.oasis-open.org/committees/daycount/tc/office.html"&gt;here&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;For everyone else, those who are not members of the ODF TC, the rules require that proposals, feedback, comments, ideas, etc., come through our &lt;a href="http://lists.oasis-open.org/archives/office-comment/"&gt;comment mailing list&lt;/a&gt;.  But before you can post to the comment list you must first accept the terms of the &lt;a href="http://www.oasis-open.org/committees/comments/index.php?wg_abbrev=office"&gt;Feedback License&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Is this extra step annoying?  Yes, it is.  But this pain is what is necessary to keep our IP pedigree clean and protect the rights of everyone to implement and use ODF.  It is part of the price we pay for open standards.  Free does not mean free from vigilance.&lt;br /&gt;&lt;br /&gt;One of my responsibilities on the ODF TC is to monitor and process the public comments we receive.  Regretfully this is a duty which I've neglected for too long.   So I spent some time this week getting caught up on the comments, entering them all into a &lt;a href="http://www.oasis-open.org/committees/document.php?document_id=27963&amp;amp;wg_abbrev=office"&gt;tracking spreadsheet&lt;/a&gt;.  We have a total of 180 public comments since ODF 1.0 was approved by OASIS, covering everything from new feature proposals to reports of typographical errors.&lt;br /&gt;&lt;br /&gt;The largest single source of comments is from the Japanese JTC1/SC34 mirror committee, where they have been translating the ODF 1.0 standard into Japanese.  As you know, you will get no closer reading of a text than when attempting translation, so we're glad to receive this scrutiny.  I'll look forward to adding the Japanese translation of ODF along side the existing Russian and Chinese translations soon.&lt;br /&gt;&lt;br /&gt;For comments that are in the nature of a defect report, i.e., reporting an editorial or technical error in the standard, we will include a fix in the ODF 1.0 errata document we are preparing.  For comments that are in the nature of a new feature proposal, we will discuss on a TC call, and decide whether or not to include it in ODF 1.2.&lt;br /&gt;&lt;br /&gt;A sample of some of the feature proposals from the comment list are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A request to support embedded fonts in ODF documents&lt;/li&gt;&lt;li&gt;A request to support multiple versions of the same document in the same file&lt;/li&gt;&lt;li&gt;A request to allow vertical text justification&lt;/li&gt;&lt;li&gt;A proposal for enhanced string processing spreadsheet functions&lt;/li&gt;&lt;li&gt;A proposal for spreadsheet values to allow units, which would help prevent calculation errors due to mixing units, i.e., adding mm to kg would be flagged as an error.&lt;/li&gt;&lt;li&gt;A proposal for allowing spreadsheet named ranges to have namespaces, with each sheet in a workbook having its own namespace.&lt;/li&gt;&lt;li&gt;A proposal to allow a document to have a "portable" flag to allow it to self-identify that it contains only portable ODF content with no proprietary extensions.&lt;/li&gt;&lt;li&gt;Proposal for adding FFT support to spreadsheet&lt;/li&gt;&lt;li&gt;Proposal for adding overline text attribute&lt;/li&gt;&lt;/ul&gt;If you have any other ideas for ODF enhancements, or thoughts on the above proposals, please don't post a response to this blog!  Remember, you need to use the &lt;a href="http://www.oasis-open.org/committees/comments/index.php?wg_abbrev=office"&gt;comment list&lt;/a&gt; for your feedback to be considered by the OASIS ODF TC.&lt;br /&gt;&lt;br /&gt;Of course, general comments are always welcome on this blog.</description><link>http://www.robweir.com/blog/2008/04/suggesting-odf-enhancements.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-7968290702831863841</guid><pubDate>Wed, 02 Apr 2008 13:00:00 +0000</pubDate><atom:updated>2008-04-02T09:03:17.926-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Standards</category><title>New Paths in Standardization</title><description>The world should be pleased to note, that with the approval of ISO/IEC 29500, Microsoft's Vector Markup Language (VML), after failing to be approved by the W3C in 1998 and after being neglected for the better part of a decade,  is now also ISO-approved.   Thus VML becomes the first and only standard that Microsoft Internet Explorer fully supports.&lt;br /&gt;&lt;br /&gt;Congratulations are due to the Internet Explorer team for reaching this milestone!&lt;br /&gt;&lt;br /&gt;Now that it has been demonstrated that pushing proprietary interfaces, protocols and formats through ISO is cheaper and faster than writing code to implement existing open standards, one assumes that the future is bright for more such boutique standards from Redmond.  Open HTML, anyone?</description><link>http://www.robweir.com/blog/2008/04/new-paths-in-standardization.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-407104104210282061</guid><pubDate>Wed, 26 Mar 2008 03:45:00 +0000</pubDate><atom:updated>2008-03-28T09:49:03.944-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Standards</category><title>Seeking Open Standards Activists</title><description>Some thoughts for &lt;a href="http://documentfreedom.org/"&gt;Document Freedom Day 2008&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Back a few weeks ago in Geneva,  &lt;a href="http://www.openforumeurope.org/"&gt;OpenForum Europe&lt;/a&gt; hosted an evening of mini-talks and a discussion panel with various well-known personalities in our field:  Vint Cerf,  Bob Sutor, Andy Updegrove and Håkon Lie.  I wasn't able to comment on the event at the time, due to my self-imposed blog silence that week, but I'd like to take the opportunity today to carry forward one of the topics discussed then.&lt;br /&gt;&lt;br /&gt;I'd like to take as my launching point the theme of Andy Updegrove's talk, which was "Civil ICT Standards".  Andy treats this subject &lt;a href="http://www.consortiuminfo.org/standardsblog/article.php?story=20080224143425160"&gt;more fully&lt;/a&gt; on his blog, and also speaks to the topic in his taped interview with Groklaw's  &lt;a href="http://www.groklaw.net/articlebasic.php?story=20080229171250199"&gt;Sean Daly&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Thus spake Updegrove:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;But as the world becomes more interconnected, more virtual, and more dependent on ICT, public policy relating to ICT will become as important, if not more, than existing policies that relate to freedom of travel (often now being replaced by virtual experiences), freedom of speech (increasingly expressed on line), freedom of access (affordable broadband or otherwise), and freedom to create (open versus closed systems, the ability to create mashups under Creative Commons licenses, and so on.&lt;br /&gt;&lt;br /&gt;This is where standards enter the picture, because standards are where policy and technology touch at the most intimate level.&lt;br /&gt;&lt;br /&gt;Much as a constitution establishes and balances the basic rights of an individual in civil society, standards codify the points where proprietary technologies touch each other, and where the passage of information is negotiated.&lt;br /&gt;&lt;br /&gt;In this way, standards can protect – or not – the rights of the individual to fully participate in the highly technical environment into which the world is now evolving. Among other rights, standards can guarantee:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;That any citizen can use any product or service, proprietary or open, that she desires when interacting with her government.&lt;/li&gt;&lt;li&gt;That any citizen can use any product or service when interacting with any other citizen, and to exercise every civil right.&lt;/li&gt;&lt;li&gt;That any entrepreneur can have equal access to marketplace opportunities at the technical level, independent of the market power of existing incumbents.&lt;/li&gt;&lt;li&gt;That any person, advantaged or disadvantaged, and anywhere in the world, can have equal access to the Internet and the Web in the most available and inexpensive method possible.&lt;/li&gt;&lt;li&gt;That any owner of data can have the freedom to create, store, and move that data anywhere, any time, throughout her lifetime, without risk of capture, abandonment or loss due to dependence upon a single vendor.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Let us call these “Civil ICT Rights,” and pause a moment to ask: what will life be like in the future if Civil ICT Rights are not recognized and protected, as paper and other fixed media disappear, as information becomes available exclusively on line, and as history itself becomes hostage to technology?&lt;/blockquote&gt;&lt;br /&gt;This rings true to me.  Technology, computer technology in particular, now permeates our lives.  We interact with it daily, from the moment the internet-radio alarm clock goes off, until days end, when we check our email "one last time" before going to bed.&lt;br /&gt;&lt;br /&gt;Similarly, the standards that define the interfaces between these devices are also of increasing importance.  There was once a time when standards dealt only with the "infrastructure",  the stuff in the walls and under the panel floor, or in that funny little locked door off the hallway, with all the cables and flashing lights, where strange men with clipboards would occasionally emerge, accompanied by a poof of cold air and the buzzing of machines.&lt;br /&gt;&lt;br /&gt;But today, the technology and the standards that mediate the technology are now directly in front of your face.  Think MP3 players.  Think DVD's.  Think DRM.  Think cellular phones.  Think web pages.  Think encryption.  Think privacy.  Think documents.  Think documents-privacy-security-DRM, your data and what you are allowed to do with it, and what others are allowed to do with it, and whether you control any bit of this in this mad world of ours.&lt;br /&gt;&lt;br /&gt;Between you and the tasks that want to do today stands technology and the standards that mediate that technology.  Standards are damn important.&lt;br /&gt;&lt;br /&gt;Now, although the reach of technology and ICT standards has progressed over the years, the organizations and the processes that create these standards have not always kept up.  In many cases standardization remains the creature of big industry with little or no consumer input.   It is back-room discussions, where companies connive to see how many patents of their own portfolio they can encumber the standard with.  A successful standard is one where no major company is left hungry.  Consensus means everyone at the table has been fed.   That is the traditional world of technology standards.  It brings to mind the famous line from Adam Smith:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public, or in some contrivance to raise prices — &lt;cite&gt;The Wealth of Nations&lt;/cite&gt; (I.x.c.27)&lt;/blockquote&gt;&lt;br /&gt;Luckily, there is some hope.  The proponents of "open standards" seek standards based on principles of open participation, consensus decision making, non-profit stewardship, royalty-free IP, and free access to standards.  The web itself, with the underlying network protocol stack, HTML family of formats with DOM and scripting API's is a shining example of what open standards can accomplish.  Tim Berners-Lee says it best, in his &lt;a href="http://www.w3.org/People/Berners-Lee/FAQ.html"&gt;FAQ's&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;Q: Do you have had mixed emotions about "cashing in" on the Web?&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;A: Not really.  It was simply that had the technology been proprietary, and in my total control, it would probably not have taken off.  The decision to make the Web an open system was necessary for it to be universal.  You can't propose that something be a universal space and at the same time keep control of it.&lt;/blockquote&gt;&lt;br /&gt;But it is important to realize that "control" mechanisms in standards go well beyond IP and organization issues.  There are other important factors at play, and we need to address these as well.   Knut Blind discusses some of these issues a section called "Anti-Competitive Effects of Standards" from his &lt;cite&gt;The Economics of Standards&lt;/cite&gt; (2004).&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;The negative impact of standards for competition are mostly caused by a biased endowment with resources available for the standardization process itself. Therefor, even when the consensus rule is applied, dominant large companies are able to manipulate the outcomes of the process, the specification of the standard, into a direction which leads to skewed distribution of benefits or costs in favor of their interests.&lt;/blockquote&gt;&lt;br /&gt;In other words, participation in standardization activities is time consuming and expensive, and large companies are much more able to make this kind of commitment than small companies, organizations or individuals.  So ,large companies rule the world.&lt;br /&gt;&lt;br /&gt;This is especially true with standardization at the international level, where decisions are often made at meetings in very expensive international locations.   JTC1 is still discussing what technologies would be required to allow participation in meetings without travel.  (Hint — its called a "telephone")  To put this in perspective, my week in Geneva cost $3687.52.  I flew coach, ate most of my meals on the cheap, often just grabbing hors d'oeuvres at receptions, and I received negotiated IBM corporate rates for air and hotel.  This is one JTC1 meeting.  What if I wanted to be really active?  Add in two SC34 Plenary meeting (Norway/Kyoto).  Add in JTC1 Plenary meetings.  Add in US NB meetings.  Add in US NB membership fees, consortium fees, conferences, etc.   This starts adding up, around $40,000/year to participate actively in tech standards, and this doesn't include the cost of my time.&lt;br /&gt;&lt;br /&gt;How many small companies are going to pay this amount?  How many non-profit organizations?  How many individuals?  Not many.&lt;br /&gt;&lt;br /&gt;But in spite of the expense, in spite of the large company bias of the international standardization system, I saw reason for hope at the Geneva BRM.  I saw younger participants, with fire in their bellies.  I saw FOSS supports from developing countries.  I saw Linux on laptops.   I saw participants from FOSSFA, &lt;a href="http://www.siug.ch/"&gt;SIUG&lt;/a&gt;, &lt;a href="http://www.effi.org/"&gt;EFFI&lt;/a&gt;, ODF Alliance Brazil, &lt;a href="http://directories.coss.fi/en/"&gt;COSS&lt;/a&gt;, etc.  They joined their NB's, participated in their NB debates and were appointed to represent their countries in the BRM.&lt;br /&gt;&lt;br /&gt;Sure, it is only a foot in the door.  One in five BRM participants were Microsoft employees.  But it was a hopeful sign.  We've planted the seed.   We must plant more.  And we must see that they grow.&lt;br /&gt;&lt;br /&gt;Strength in standards participation comes with time, with participation, with networking, with learning the rules (written and unwritten) learning from others, etc.  Just as we have FOSS experts in the software engineering, in law, in business, in training/education, we also need experts in standardization.   Certainly the bread and butter participation will be from individual engineers, participating for the duration of a  particular proposal or group of proposals.  But we also need  the institutional linchpin  participants, those who  have taken on leadership positions within standards organizations, and whose influence is broad and deep.&lt;br /&gt;&lt;br /&gt;FOSS also needs a standards agenda.  In a world of patent encumbered standards controlling the central networks, open source software dies, and dies quickly.  We must protect and grow the open standards, for without them we cease to exist.&lt;br /&gt;&lt;br /&gt;What standards are important?  Which demand FOSS representation?  Remember just a few weeks ago, when there was a lot of concern about how the DIS 29500 BRM added explicit mention of the patent-encumbered MP3 standard, but failed to mention Ogg Vorbis at all? Although I sympathize with this concern, the fact is the BRM could not have added Ogg Vorbis at all, because it is not a standard.  Are we willing to do more than lament about this?  I tell you that if Ogg Vorbis had been an ISO standard it would have been explicitly added to OOXML at the BRM.  Are we willing to do something about it?&lt;br /&gt;&lt;br /&gt;What are the standards critical to FOSS, and what are we doing about it?  What standards, existing or potential, should we be focusing on?  I suggest the following for a start:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Ogg Vorbis&lt;/li&gt;&lt;li&gt;Ogg Theora&lt;/li&gt;&lt;li&gt;PNG, ISO/IEC 15948&lt;br /&gt;&lt;/li&gt;&lt;li&gt;ODF, ISO/IEC 26300&lt;br /&gt;&lt;/li&gt;&lt;li&gt;PDF, ISO 3200&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Linux Standard Base (LSB),  ISO/IEC 23360&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Most of the W3C Recommendations&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Most of the IETF RFC's&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;I'm sure you can suggest many others.&lt;br /&gt;&lt;br /&gt;Let's put it all together.  Some ICT standards directly impact what we can do with our data and our digital lives.  These are the Civil ICT Standards.  We need to ensure that these standards remain open standards, so anyone can implement them freely.  However, the standardization system, both at the national and international levels is biased in favor of those large corporations best able to afford dedicated staff to work within those organizations and develop personal effectiveness and influence in the process.  Showing up once a year is not going to work.  If FOSS is going to maintain any level of influence in formal standardization world, especially at the high-stakes international level, it needs to find a way to identify, nurture and support participation of  "Open Standards Activists".  The GNOME Foundation's joining of Ecma, or KDE's membership in OASIS are examples how this could work.  Umbrella organizations like &lt;a href="http://www.digistan.org/"&gt;Digistan&lt;/a&gt; also are critical and can be a nucleus for standards activists.  But what about taking this to the next level, to NB membership?  Another example is the Linux Foundation's &lt;a href="http://www.linux-foundation.org/en/Travelfund"&gt;Travel Fund&lt;/a&gt;, designed to sponsor attendance of FOSS developers at technical conferences.  Imagine what could be done with a similar fund for attendance at standards meetings?&lt;br /&gt;&lt;br /&gt;So that is my challenge to you on this first Document Freedom Day.  We're near the end of what promises to be one of many battles.  The virtual networks of the future are just as lucrative as the railroad and telephone networks of the last century were.   These include the network of compatible audio formats, or the network of IM users using a compatible protocol, or the network of users using a single open document format.   If FOSS projects and organizations want to secure the value for their users that comes from being part of these networks,  then FOSS projects must encourage the use of open standards, and must also encourage and nurture new talent for the next generation of open standards activists.&lt;br /&gt;&lt;br /&gt;I'm looking forward to the day, soon, when I can search Google for "open standards activist" and not find a paid Microsoft shill among the listings on the first page.</description><link>http://www.robweir.com/blog/2008/03/seeking-open-standards-activists.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-4949663472099773441</guid><pubDate>Mon, 24 Mar 2008 20:25:00 +0000</pubDate><atom:updated>2008-04-02T21:26:28.351-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>OOXML</category><title>OOXML's (Out of) Control Characters</title><description>Let's start with the concepts of “lexical” and “value” spaces in XML, as well as the mechanism of “derivation by restriction” in XML Schema.  Any engineer can understand the basics here, even if you don't eat and drink XML for breakfast.&lt;br /&gt;&lt;br /&gt;The value space for an XML data item comprises the set of all allowed values.  So the value space for the “float” data type would be all floating point numbers, such as &lt;b&gt;12.34&lt;/b&gt; or &lt;b&gt;43.21&lt;/b&gt;.  The lexical space comprises all ways of expressing these values in the character stream of an XML document.  So lexical representations of the value &lt;b&gt;12.34&lt;/b&gt;  include “12.34”, “12.340” and '1.234E1”.  For ease of illustration I will indicate value space items in bold, and lexical space items in quotes.  In general there are multiple lexical representations that may represent the same value.&lt;br /&gt;&lt;br /&gt;Character data in XML also permits more than one lexical representation of the same value.  For example, “A” and “&amp;amp;#65;” both represent the value &lt;span style="font-weight: bold;"&gt;A&lt;/span&gt;.   The “numerical character reference” approach allows an XML author to easily encode the occasional Unicode character which is not part of the author's native editing environment, e.g., adding the copyright character or occasional foreign  character.   The value space allowed by XML includes most of Unicode, including all of the major writing systems of the world, current and historical.&lt;br /&gt;&lt;br /&gt;The concern I have with DIS 29500 concerns Ecma's  introduction of a ST_XString (Escaped String) datatype.  This new type is defined via the following XML Schema definition:&lt;br /&gt;&lt;br /&gt;&amp;lt;simpletype name="ST_Xstring"&amp;gt;&lt;br /&gt;&amp;lt;restriction base="xsd:string"&amp;gt;&lt;br /&gt;&amp;lt;/simpletype&amp;gt;&lt;br /&gt;&lt;br /&gt;This uses the “derivation by restriction” facility of XML Schema to define a new type, derived from the standard xsd:string schema type.  The xsd:string type is defined to allow only character values that are also allowed in the XML standard.&lt;br /&gt;&lt;br /&gt;The use of derivation by restriction implies a clear relationship between the ST_Xstring type and the base type xsd:string.    This is stated in  XML Schema Part 1,  &lt;a href="http://www.w3.org/TR/xmlschema-1/#Type_Definition_Summary"&gt;clause 2.2.1.1&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;A type definition whose declarations or facets are in a one-to-one relation with those of another specified type definition, with each in turn restricting the possibilities of the one it corresponds to, is said to be a &lt;b&gt;restriction&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;The specific restrictions might include narrowed ranges or reduced alternatives. Members of a type, A, whose definition is a restriction of the definition of another type, B, are always members of type B as well.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;The latest sentence can be taken as a restatement of the &lt;a href="http://en.wikipedia.org/wiki/Liskov_substitution_principle"&gt;Liskov Substitution Principle&lt;/a&gt;, a fundamental principle of interface design, that a subtype should be usable (substitutable) wherever a base type is usable.  It is this principle that ensures interoperability.   A type derived by restriction limits, restricts, constrains, reduces the permitted value space of its base type, but it cannot increase the value space beyond that permitted by its base type.&lt;br /&gt;&lt;br /&gt;So, with that background, let's now look at how OOXML defines the semantics of its ST_Xstring type:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;ST_Xstring (Escaped String)&lt;br /&gt;&lt;br /&gt;String of characters with support for escaped invalid-XML characters.&lt;br /&gt;&lt;br /&gt;For all characters which cannot be represented in XML as defined by the XML 1.0 specification, the characters are escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character's value. [Example: The Unicode character 8 is invalid in an XML 1.0 document, so it shall be escaped as _x0008_. end example]&lt;br /&gt;&lt;br /&gt;This simple type's contents are a restriction of the XML Schema string datatype.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;In other words,  although ST_Xstring is declared to be a restriction of xsd:string it is, via a proprietary escape notation,  in fact expanding the semantics of xsd:string to create a value space that includes additional characters, including characters that are invalid in XML.&lt;br /&gt;&lt;br /&gt;Let's review some of the problems it introduces.&lt;br /&gt;&lt;br /&gt;First, the semantics of XML strings that contain invalid XML-characters is undefined by this or any other standard.  For example, OOXML uses ST_Xstring in Part 4, Clause 3.3.1.30 to store the error message which should be displayed when a data validation formula fails.  But what should an OOXML-supporting application do when given a display string which contains control characters from the C0 control range, characters forbidden in XML 1.0?&lt;br /&gt;&lt;ul&gt;&lt;li&gt;U+0004 END OF TRANSMISSION&lt;br /&gt;&lt;/li&gt;&lt;li&gt;U+0006 ACKNOWLEDGE&lt;/li&gt;&lt;li&gt;U+0007 BELL&lt;br /&gt;&lt;/li&gt;&lt;li&gt;U+0008 BACKSPACE&lt;/li&gt;&lt;li&gt;U+0017 SYNCHRONOUS IDLE&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;How should these characters be displayed?&lt;br /&gt;&lt;br /&gt;There is a reason XML excludes these dumb terminal control codes.  They are neither desired nor necessary in XML.&lt;br /&gt;&lt;br /&gt;Elliotte Rusty Harold explains the rationale for this prohibition in his book &lt;a href="http://www.cafeconleche.org/books/effectivexml/"&gt;&lt;i&gt;Effective XML&lt;/i&gt;&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;The first 32 Unicode characters with code points 0 to 31 are known as the C0 controls.  They were originally defined in ASCII to control teletypes and other monospace dumb terminals.  Aside from the tab, carriage return, and line feed they have no obvious meaning in text.  Since XML is text, it does not include binary characters such as NULL  (#x00), BEL (#x07), DC1 (#x11) through DC4 (#x14), and so forth.  These noncharacters are historic relics.  XML 1.0 does not allow them.&lt;br /&gt;&lt;br /&gt;&lt;!-- Is it really hasty or nasty? --&gt;This is a good thing.  Although dumb terminals and binary-hostile gateways are far less common today than they were twenty years ago, they are still used, and passing these characters through equipment that expects to see plain text can have nasty consequences, including disabling the screen.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Further, since these characters are undefined in XML, they are unlikely to work well with existing accessibility interfaces and devices.  At best these characters will be ignored and introduce subtle errors.  For example, what does “$10,[BS]000” become if one system processes the backspace and another does not? Worst case, the accessibility interface expecting a certain range of characters as defined by the xsd:string type will crash when presented with values beyond the expected range.&lt;br /&gt;&lt;br /&gt;Interfaces with existing programming languages are also harmed by ST_Xstring.  How does a C or C++ XML parser deal with XML that now can allow a U+0000 (NULL) character in the middle of a string, something which is illegal in that programming language?&lt;br /&gt;&lt;br /&gt;What about XML database interfaces that take XML data and store it in relational tables?  If they are schema-aware and see that ST_Xstring is merely a restriction of xsd:string, they will assume the normal range of characters can be stored wherever an xsd:string can be stored.  But since the value space is expanded, there is no guarantee that this will still be true.  These characters may cause validation errors in the database.&lt;br /&gt;&lt;br /&gt;By now, the observant reader may be accusing me of pulling a fast one.  "But Rob, none of the above is a problem if the application simply leaves the ST_Xstring encoded and does not try to decode or interpret the non-XML character," you might say.&lt;br /&gt;&lt;br /&gt;OK.  Fair enough.  Let's follow that approach and see where it leads us.&lt;br /&gt;&lt;br /&gt;Let's look at interoperability with other XML-based standards.  Imagine you do a DOM parse of an OOXML document that contains “strings” of type ST_Xstring.  Either your parser/application is OOXML-aware, or it isn't.  In other words, either it is able to interpret the non-standard _xHHHH_ instructions, or it isn't.&lt;br /&gt;&lt;br /&gt;If it doesn't understand them, then any other code that operates on the DOM nodes with ST_Xstring data is at risk of returning the wrong answer.  For example, what is the length of the string “ABC”?  Three-characters, of course.   But what is the length of the string “_x0041_BC” ?  These two strings both have the same values according to OOXML.  But an XML application might return 9 or return 3, depending on whether it is OOXML-aware or not.  Since most (all) XML parsers are unaware of the non-standard escape mechanism proposed by OOXML, they will typically calculate things such as string lengths, string comparisons, string sorting, etc., incorrectly.&lt;br /&gt;&lt;br /&gt;But suppose the parser/application is OOXML-aware and correctly decodes these character references into the correct Unicode values, then what?  Assuming the host language doesn't crash from the existence of this control characters, we then are presented with problems at the interface with any other code that operates on the DOM.  Suppose we try to transform the DOM via XSLT to  XHTML.  Will the XSLT engine properly handle the existence of these forbidden character values?  The XSLT engine may just crash.  But suppose it doesn't.  How does it write out these control characters into XHTML?  It can't.  These values are not permitted in XHTML.  Dead end.  What about DocBook?  DITA?  OpenDocument Format?  Not possible.  Since these characters are not permitted in XML 1.0 at all, they will be forbidden in all other markup languages that are based on XML 1.0, or even XML 1.1 for that matter (XML 1.1 allows some but not all of these characters, in particular the NULL character is excluded).&lt;br /&gt;&lt;br /&gt;Note further that with XML pipelining and with mashups, the application that writes XML output typically does not have direct knowledge of the application that originally produced the XML values.  This decoupling of producers and consumers is an essential aspect of modern systems integration, include Web Services.  By corrupting XML string values in the way that it does, DIS 29500 breaks the ability to have loosely coupled systems.  Once the value space is polluted by these aberrant control characters, every application, every process that touches this data must be aware of their non-standard idiosyncrasies lest they crash or return incorrect answers.  In this way, one standard perverts the entire XML universe, forcing them all to contend with the poor hygiene of a single vendor.&lt;br /&gt;&lt;br /&gt;The reader might think that I exaggerate the importance of this, that surely ST_Xstring is only used in OOXML in edge cases, in rare, compatibility modes.  We wish that this were true.  However, a look at the DIS 29500 shows that ST_Xstring is pervasive, and in fact is the predominant data type in SpreadsheetML, used to express the vast majority of spreadsheet content, including cell contents, headers, footers, displays strings, error strings, tooltip help, range names, etc.  Any application that operates on an OOXML spreadsheet will need to deal with this mess.&lt;br /&gt;&lt;br /&gt;For example, here are some uses of ST_Xstring in DIS 29500, Part 4:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Clause 3.2.3 for the name of a  custom view in a spreadsheet&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Clause 3.2.5 for the name of a  spreadsheet named range, for the descriptive comment, for the name  description, for the&lt;br /&gt;help topic, the keyboard shortcut, the status  bar text and for the menu item text&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.2.14 for the name of a  spreadsheet function group&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.2.19 for the name of a  sheet in a workbook&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.2.22 for the name of a  smart tag as well as for the URL of a smart tag.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.2.25 for the destination  file name and title when publishing spreadsheet to the web.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.10 for the value of a  conditional formatting object, e.g., a gradient&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.20 for the name of a  custom property&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.28 for sheet and  range names&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.30 for error message  string, error message title, prompt string and prompt title in a  spreadsheet data validation definition.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.35 for the value of a  footer for even numbered pages.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.36 for the value of a  header for even numbered pages.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.38 for the content of  the first page footer&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.39 for the content of  the first page header&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.44 for the display  string for a hyperlink, the tooltip help for the link, also the  anchor target if the hyperlink is to an HTML page&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.49 for values of  input cells in a scenario&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.50 for cell inline  text values&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.55 for the value of a  footer for odd numbered pages.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.56 for the value of a  header for odd  numbered pages.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.73, in scenarios for  the comment text, the scenario name and the name of the person who  last changed the scenario.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.88 when defining sort  condition, for the values of a the custom sort list&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.93 for the value  contained within a cell&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.1.94 for information  associated with items published to the web, including the  destination file and the title of the output HTML file&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.2.2 for expressing the  criteria values in a filter&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.3.15 for the key/values  for smart tag properties&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.4.4 for expressing the  contents of a rich text run&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.4.5 for expressing the  name of a font&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.4.6 for expressing the  text of a phonetic hint for East Asian text&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.4.8 for expressing a text  item  in the shared string table&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.4.12 for the text content  shown as part of a string&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.5.1.2 for a table,  expressing a textual comment, a display name as well as style names.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.5.1.3 for a table column,  expressing cell and row style names, column name&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.5.1.7 for column  properties created from an XML mapping, for expressing the  associated XPath.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.5.2.4 for the XPath  associated with column properties for XML tables&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.7.1-3.7.6  for specifying  content of tracked comments, including the text of the comments as  well as the authors of the comments&lt;br /&gt;&lt;/li&gt;&lt;li&gt;     Clause 3.8.29 expressing the name  of a font&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;There are hundreds of additional uses.  A search of DIS 29500 Part 4 for “ST_Xstring” returns 467 hits.  OOXML also defines two additional types, “lptsr” (7.4.2.8) and “bstr” (7.4.2.4) that have the same flaw as ST_Xstring.&lt;br /&gt;&lt;br /&gt;The reader might further argue that, although the type allows characters that are forbidden by XML, the actual occurrence of these values in real legacy documents is likely to be rare.  This might be true, but this is cause for even greater concern.  If every document contained these control characters, then we would immediately be aware of any interoperability problems when integrating OOXML data with other systems.  But if these characters are permitted, but occur rarely and randomly, then the integration errors will also occur rarely and randomly, allowing data corruption and other problems to occur and propagate further before detection.&lt;br /&gt;&lt;br /&gt;In summary, we are concerned that the ST_Xstring type in OOXML opens us up to problems such as:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Introducing accessibility problems&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Breaking unaware C/C++ XML parsers&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Breaking XML databases&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Breaking interoperability with other  XML languages&lt;/li&gt;&lt;li&gt;Breaking application logic related to string searching, sorting, comparisons, etc.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Introducing errors that will be hard to  detect and resolve&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;Possible remedies include:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Use xsd:string uniformly instead  of ST_Xstring, with no use of forbidden XML characters. This would  require that applications that read legacy binary documents  containing such characters eliminate them at this point, perhaps replacing them with licit characters or with whitespace.  No  application will be more able to devise the original meaning and  intent of these characters than the original vendor.  So they should  be responsible for cleaning up these strings to make them XML-ready.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Use a non-string type such as  the binary xsd:hexBinary or xsd:base64Binary to represent these data items.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Use a mixed content encoding,  where the licit characters are represented by xsd:string data, and  the forbidden characters are denoted by specially-defined elements.   So “A_x0008_BC” would become:  &amp;lt;text&amp;gt;A&amp;lt;backspace/&amp;gt;BC     &amp;lt;/text&amp;gt;.   In this case the semantics of the &amp;lt;backspace&amp;gt; element would  need to be documented in the DIS 29500 specification, including its effect on searching, sorting, length calculations, etc.&lt;/li&gt;&lt;/ol&gt;</description><link>http://www.robweir.com/blog/2008/03/ooxmls-out-of-control-characters.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-8592573851252453622</guid><pubDate>Mon, 24 Mar 2008 19:00:00 +0000</pubDate><atom:updated>2008-03-24T15:24:07.094-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>OOXML</category><title>Five (Bad) Reasons to Approve OOXML</title><description>&lt;ol&gt;&lt;li&gt;If you don't approve OOXML, Microsoft will walk away, and you'll never hear from them again.  Forget the fact that OOXML is already an Ecma standard (Ecma-376), and cannot be taken away.  Forget the fact that Microsoft has other formats lined up for ISO approval in the near future, like XPS or HD Photo.  Microsoft wants you to think that if you don't give them exactly what they want, now, they will walk away from ISO and you will be the worse from it.   We need to encourage Microsoft for their abuse of the standardization process, in hopes that their participation will evolve in line with our hopes, and not our fears, that they will improve on the standardization side, while curbing the abuse side.  Of course, the encouragement could be misinterpreted to mean the opposite, and we could get more abuse, and even lower quality standards.  I guess that's the risk we'll just need to take.  By similar abuses of logic small children hold their breath until their faces turn blue, thinking they can scare adults into giving them what they want.  It doesn't work there either.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;If you approve OOXML, you can have the privilege of spending the next 5 years in the glorious work of fixing thousands of defects in the text.  You can get a seat at the table, fixing bugs that should have been fixed in Ecma before OOXML was even submitted to JTC1.  Forget the fact that maintenance in JTC1 is a ponderous, time consuming activity, where individual defects are enumerated, changes proposed, discussed, voted on, etc.  Forget the fact that the recent BRM showed that you can't really get through more than 60 defects in a week-long meeting.  Forget the fact that fixing defects in Ecma, not JTC1, would be far faster and easier due to the lighter-weight process Ecma imposes on their TC's.  Forget that Fast Track is intended for mature, adopted standards not for ones that will require a "Perpetual BRM".  Forget all that.  You want a seat at the bug fixing table?  You got it.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Billions and Billions of legacy documents.  Well, actually these legacy documents are not in OOXML format; they are in the legacy binary format.  And no mapping has been provided from the legacy formats to OOXML.  But there are billions and billions of these legacy documents.  That must be important.  So vote Yes for OOXML because there are billions and billions of documents in some other format that is nebulously related to it.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;More standards are better.  More standards means more choice, means more decisions, means more consultants, means more money paid to XML experts.  You'll sooner find the American Dairy Council recommending less milk consumption than a standards professional calling for fewer standards.  So ignore quality, maturity and need.  More standards are a good thing.  Like Blue-ray and HD DVD.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;ODF will be better if OOXML is approved.  In OASIS we're too stupid to look up legacy features or Excel spreadsheet formulas in Ecma-376.  We would have never thought of that.  We believe the only way to make ODF better is to make it more like OOXML.  That is why we would like to encourage nice little JTC1 countries like Kazakhstan to vote YES for OOXML.  As soon as OOXML is approved, then magically, it becomes useful to us.  But the exactly same text, not approved by Kazakhstan and JTC1, is not useful to us at all.  It is all or nothing.  There is nothing in the middle.  Rather than taking a useful, high quality text, and approving it on its merits, we are asked to approve a specification with thousands of defects, and by our approval we transform it into something useful to ODF.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;</description><link>http://www.robweir.com/blog/2008/03/five-bad-reasons-to-approve-ooxml.html</link><author>noreply@blogger.com (Rob)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-11236681.post-4871700534128679752</guid><pubDate>Tue, 18 Mar 2008 22:00:00 +0000</pubDate><atom:updated>2008-03-20T11:37:13.384-04:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>OOXML</category><title>How many defects remain in OOXML?</title><description>DIS 29500, Office Open XML, was submitted for Fast Track review by Ecma as 6,045 page specification.  (After the BRM, it is now longer, maybe 7,500 pages or so.  We don't know for sure, since the post-BRM text is not yet available for inspection.)  Based on the original 6,045 page length, a 5-month review by JTC1 NB's lead to 48 defect reports by NB's, reporting a total of 3,522 defects.   Ecma responded to these defect reports with 1,027 proposals, which the recent BRM, mainly through the actions of one big overnight ballot, approved.&lt;br /&gt;&lt;br /&gt;So what was the initial quality of OOXML, coming into JTC1?   One measure is the defect density, which we can say is at least one defect for every 6045/1027 = 5.8 pages.  I say "at least" because this is the lower bounds.  If we believed that the 5-month review represented a complete review of the text of DIS 29500, by those with relevant subject matter expertise, then we would have some confidence that all, or at least most, defects were detected, reported and repaired.  But I don't know anyone who really thinks the 5-month review was sufficient for a technical review of 6,045 pages.   Further, we know that Microsoft worked actively to suppress the reporting of defects by NB's.  So the actual defect density is potentially quite a bit higher than the reported defect density.&lt;br /&gt;&lt;br /&gt;But how much higher?   This is the important question.  It doesn't matter how many defects were fixed.  What matters is how many remain.&lt;br /&gt;&lt;br /&gt;There are several approaches to answering this question.  One approach is to look at defect "find rates", the number of defects found per unit of time spent reviewing, and fit that to a model, typical an S-curve (sigmoid) and use that model to predict the number of defects remaining.  However, we have no time/effort data for the DIS 29500 review, so we don't have enough data to create that model.   Another approach is to randomly sample the post-BRM text and statistically estimate the defect density by this sample.&lt;br /&gt;&lt;br /&gt;Are there any other good approaches?&lt;br /&gt;&lt;br /&gt;Here is the plan.  I will use the second approach.  Since I do not actually have the post-BRM text, I need to make some adjustments.  I'll start with the original text, in particular Part 4, the XML reference section, at 5,220 pages, where the meat of the standard is.  I'll then create a spreadsheet and generate 200 random page numbers between 1 and 5,220.     For each random page I will review the clause associated with that page and note the technical and editorial errors I find.  I will then check these errors to see if any of them were addressed by BRM resolutions.&lt;br /&gt;&lt;br /&gt;Based on the above, I will be able to estimate two numbers:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The defect density of the text, both pre and post BRM&lt;/li&gt;&lt;li&gt;The fraction of defects which were detected by the Fast Track review.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;So if I find N defects, and 0.9N of those issues were already found during the Fast Track review and were addressed by the BRM, then we can say that the Fast Track procedure was 90% effective in finding and removing errors.  Some practitioners would call that the defect removal "yield" of the process.  But if we find that only 0.1N of the errors were reported and addressed by the BRM, then we'll have a different opinion on the sufficiency of the Fast Track review.&lt;br /&gt;&lt;br /&gt;Clear enough? Microsoft is claiming something like 99% of all issues were resolved at the BRM.  So let's see if we get anything close.&lt;br /&gt;&lt;br /&gt;I'm not done with this study yet.  I'm finding so many defects that recording them is taking more time than finding them.  But since this is topical, I will report what I have found so far, based on the first 25 random pages, or 1/8th completion of my target 200.  I've found 64 technical flaws.  None of the 64 flaws were addressed by the BRM.  Among the defects are some rather serious ones such as:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;storage of plain text passwords in database connection strings&lt;/li&gt;&lt;li&gt;Undefined mappings between CSS and DrawingML&lt;/li&gt;&lt;li&gt;Errors in XML Schema definitions&lt;/li&gt;&lt;li&gt;Dependencies of proprietary Microsoft Internet Explorer features&lt;/li&gt;&lt;li&gt;Spreadsheet functions that break with non-Latin characters&lt;/li&gt;&lt;li&gt;Dependencies on Microsoft OLE method calls&lt;/li&gt;&lt;li&gt;Numerous undefined terms and features&lt;/li&gt;&lt;/ul&gt;As I said, this study is still underway.  I'll list the defects I've found so far, and add to it as I complete the task over the next few days.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Page 692, Section  2.7.3.13 — no errors found&lt;/li&gt;&lt;li&gt;Page 1457, Section 2.15.3.45    — This is a compatibility setting which creates needless complexity for implementers who now must deal with two  different ways of handling a page break, one in which a page break ends the current paragraph, and another where it does not.  This is not a general need and expresses only a single vendor’s legacy setting.&lt;/li&gt;&lt;li&gt;Page 490, Section 2.4.72    — This defines the ST_TblWidth type, used to express the width of a table column, cell spacing, margins, etc.  The allowed values of this type express the measurement units to be used:  Auto, Twentieths of a point, Nil (no width), Fiftieths of a percent.   I find these choices to be capricious and not based on any sound engineering principle.  It also mixes units with width values (Nil) and modes (auto).  This should be changed to allow measurements in natural units, such as allowed in XSL-FO or CSS2, such as mm, inches, points, pica.  Also, do not mix units, values and modes in the same attribute.  Nil is best represented by the value 0 and Auto should be its own Boolean attribute.&lt;/li&gt;&lt;li&gt;Page 328, Section 2.4.17    — The frame attribute description says it “Specifies whether the specified border should be modified to create a frame effect by reversing the border's appearance from the edge nearest the text to the edge furthest from the text.”  This is not clear.  What does it mean to reverse a border’s appearance?  Are we doing color inversions?  Flipping along the Y-axis?  What exactly?   Also a typographical error:  “For the right and top borders, this is accomplished by moving the order down and to the right of its original location.”  Should be “moving the border down…”  Also, it is not stated how far the border should be moved.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page 1073, Section 2.14.8    — This feature is described as:  “This element specifies the connection string used to reconnect to an external data source. The string within this element's val attribute shall contain the connection string that the hosting application shall pass to a external data source access application to enable the WordprocessingML document to be reconnected to the specified external data source.”   Since connection to external data typically requires a user ID and a password, the lack of any security mechanism on this feature is alarming.  The example given in the text itself hardcodes a plain-text password in it the connection string.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page 4387, Section 6.1.2.3 — For the “class” attribute it says “Specifies a reference to the definition of a CSS style.”  The example implies that some sort of mapping will occur between CSS attributes and DrawingML.  But no such mapping is defined in OOXML.   The "doubleclicknotify" attribute implies some sort of event model that us undefined in OOXML.  How do you send a message for doubleclicknotify?  Why do we describe organization chart layouts here when it is not applicable to a bezier curve?  What happens if this shape is declared to be a horizontal rule or bullet or ole object? The text allows you label it as one of these, but assigns no meaning or behavior to this.   Why do we have an spid as well as an id attribute? The "target" attribute refers to Microsoft-specific I.E. features such as "_media".   Although the text says that control points have default values, the schema fragment does not show this.&lt;/li&gt;&lt;li&gt;Page 3164, Section 4.6.88    — This and the following two elements are all called "To" but this seems to be a naming error.   4.6.89 is essentially undefined.    What does "The element specifies the certain attribute of a time node after an animation effect" mean?  It doesn't seem to really signify anything.  Ditto for 4.6.90.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page 5098, Section 7.1.2.124        — The example does not illustrate what the text claims it does.   The example doesn't even use the element defined by this clause.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page 4492, Section 6.1.2.11    — The "althref" attribute is described as "Defines an alternate reference for an image in Macintosh PICT format".  Why is this necessary for only Mac PICT files?  Why would "bilevel" necessarily lead to 8 colors? We're well beyond 8-bit color these days.  "blacklevel" attribute is defined as "Specifies the image brightness. Default is 0."  What is the scale here?  This needs to be defined.  Is it 0-1.0, 0-255 or what?  And what is "image brightness" in terms of the art?  Is this luminosity?  Opacity?  Is this setting the level of the black point?  For "cropleft", etc. -- what units are allowed? (implies %)   How does "detectmouseclick" work when no event model is defined?   "emboss effect" is not defined.    "gain" has the same problem as "blacklevel" -- no scale is defined.  This element has two different id attributes in two different namespaces, with two different types.  "movie" attribute is described as "Specifies a pointer to a movie image. This is a data block that contains a pointer to a pointer to movie data".   Excuse me?  "A pointer to a pointer to movie data"?  This is useless.  The "recolortarget" example appears to contradict the description.  It shows shows blue recolored to red, not black.  The "src" attribute is said to be a URL, yet is typed to xsd:string.  This should be xsd:anyURI.&lt;/li&gt;&lt;li&gt;Page 1431, Section 2.15.3.30 —  no errors noted&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page 3405, Section 5.1.5.2.7    — The conflict resolution algorithm should be normative, not merely in a note.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page 875, Section 2.11.21    — Instead of saying that the footnote "pos" element should be ignored if present at the section level, the schema should be defined so as to not allow it at the section level.  In other words, this should be expressed as a syntax constraint.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page 1955, Section 3.3.1.20    — This facility for adding "arbitrary" binary data to spreadsheets is said to be for "legacy third-party document components".  No documentation or mapping for such legacy components has been provided, so interoperability with this legacy data cannot be achieved.  Why isn't this expressed using the extension mechanisms of Part 5 of the DIS?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Page 4526, Section 6.1.2.13    — The "allowoverlap" attribute is not sufficiently defined.  In particular, what determines whether the object shifts to right or left? ST_BWMode is not adequately defined.  For example, one option is "Use light shades of gray only".  How light?  And what is the difference between "hide" and "undrawn"?   Also, concept of "wrapping polygon" is not sufficiently defined.  For example, what is the wrapping polygon for an oval?  The purpose of "dgmlayoutmru" is obscure.  Wouldn't the most-recently-used layout option be the one which is actually in use, "dgmlayout"?   The "dgmnodekind" attribute is undefined, said to be "application-specific".  Is interoperabilty not allowed?  The text seems to imply that applications must use application-specific values.   The "href" attribute is give a string schema type. Shouldn't this be xsd:anyURI.  The "id" attribute is said to be a "unique identifier".  Unique in what domain?  Among shapes of this type?  Among all shapes?  All shapes on this page? Among all ID's in the document? The "preferrelative" attribute is not sufficiently defined.  Where is the original size stored?  After 