{"id":104,"date":"2007-06-19T18:40:00","date_gmt":"2007-06-19T23:40:00","guid":{"rendered":"http:\/\/2d823b65bb.nxcli.io\/2007\/06\/no-representation-without-specification.html"},"modified":"2007-06-19T18:40:00","modified_gmt":"2007-06-19T23:40:00","slug":"no-representation-without-specification","status":"publish","type":"post","link":"https:\/\/www.robweir.com\/blog\/2007\/06\/no-representation-without-specification.html","title":{"rendered":"No Representation Without Specification"},"content":{"rendered":"<p>Maybe I just have an ear for this, but whenever I hear a number of people saying the same odd thing, using the same strained phrase, it catches my attention and makes me take a closer look.  Individuals naturally have a great diversity of expression and phrasing, so where this is lacking, and the Borg starts speaking as one, it is good to pay it some heed.<\/p>\n<p>The word for today is &#8220;represents&#8221;.    A few exemplary quotations to demonstrate a particular pattern of use that attracted my attention:<\/p>\n<p>From Microsoft&#8217;s <a href=\"http:\/\/www.openxmlcommunity.org\/\">Open XML Community<\/a>:<\/p>\n<blockquote><p>Open XML was designed to provide users the benefits of: faithfully <span style=\"font-weight: bold;\">representing<\/span> in an open format existing office documents, interoperability, support across platforms and applications, integration with business data, internationalization, support for accessibility and assistive technologies, and long-term document preservation.<\/p><\/blockquote>\n<p>Microsoft&#8217;s Jean Paoli as quoted by <a href=\"http:\/\/www.itwriting.com\/blog\/?page_id=187\">Tim Anderson<\/a>:<\/p>\n<blockquote><p>As a design goal, we said that those formats have to <span style=\"font-weight: bold;\">represent<\/span> all the information that enables high-fidelity migration from the binary formats.<\/p><\/blockquote>\n<p>And Paoli again in a Microsoft <a href=\"http:\/\/www.microsoft.com\/presspass\/features\/2005\/nov05\/11-21Ecma.mspx\">press release<\/a>:<\/p>\n<blockquote><p>So the Office Open XML file formats <span style=\"font-weight: bold;\">represent<\/span> all the characteristics of the Office binary file formats, while making it easier for people to connect to the different islands of data in the enterprise.<\/p><\/blockquote>\n<p>Microsoft&#8217;s Brian Jones in a comment response on his <a href=\"http:\/\/blogs.msdn.com\/brian_jones\/archive\/2006\/11\/03\/novell-and-microsoft-teaming-up-on-document-interoperability.aspx\">blog<\/a>:<\/p>\n<blockquote><p>We had to leave some legacy behaviors in place because the goal of our work was to create an XML format that could <span style=\"font-weight: bold;\">represent<\/span> our existing base of Office documents.<\/p><\/blockquote>\n<p>From the OOXML Overview <a href=\"http:\/\/www.ecma-international.org\/news\/TC45_current_work\/OpenXML%20White%20Paper.pdf\">whitepaper<\/a> [pdf] presented to JTC1:<\/p>\n<blockquote><p>OpenXML was designed from the start to be capable of faithfully <span style=\"font-weight: bold;\">representing<\/span> the pre-existing corpus of word-processing documents, presentations, and spreadsheets that are encoded in binary formats defined by Microsoft Corporation.<\/p><\/blockquote>\n<p>From Ecma&#8217;s <a href=\"http:\/\/www.computerworld.com\/pdfs\/Ecma.pdf\">response<\/a> to the JTC1 NB contradiction objections:<\/p>\n<blockquote><p>OpenXML has been designed to be capable of faithfully <span style=\"font-weight: bold;\">representing<\/span> the majority of existing office documents in form and functionality.<\/p><\/blockquote>\n<p>Microsoft&#8217;s Stephen <a href=\"http:\/\/notes2self.net\/archive\/2006\/07\/28\/Gary-Edwards-on-ODF-Interop-Extensions-_2800_aka-ODF1.2-iX_2900_.aspx\">McGibbon<\/a>:<\/p>\n<blockquote><p>I represent Microsoft at all kinds of meetings and my firm understanding is that one of the things that differentiates OpenXML and ODF is OpenXML&#8217;s ability to faithfully <span style=\"font-weight: bold;\">represent<\/span> all of the previously created Microsoft Office binary format documents.<\/p><\/blockquote>\n<p>So what are we to make of this?  They are being very specific about their choice of words, aren&#8217;t they?  I wonder why&#8230;<\/p>\n<p>A file format represents data.   It stores data.  It encodes data.  These are all synonymous.  But the ability to represent data is a trivial thing to do.  For example, here an example of a markup language that can also <span style=\"font-weight: bold;\">represent<\/span> all legacy Microsoft Office documents:<\/p>\n<p>&lt;office-document&gt;<br \/>&lt;one\/&gt;<br \/>&lt;one\/&gt;<br \/>&lt;zero\/&gt;<br \/>&lt;one\/&gt;<br \/>&lt;zero\/&gt;<br \/>&lt;zero\/&gt;<br \/>&lt;\/office-document&gt;<\/p>\n<p>Since the above markup directly maps to binary, it can faithfully <span style=\"font-weight: bold;\">represent<\/span> 100% of existing Office documents with 100% backwards compatibility.  It can also <span style=\"font-weight: bold;\">represent<\/span> perfectly the documents of every other vendor, past, present and future.<\/p>\n<p>But before Ecma gets all excited that they may soon have another standard to Fast Track, I must admit the obvious.  This markup is not all that useful as an interoperable document format.  Why?  Because although it can <span style=\"font-weight: bold;\">represent<\/span> 100% of legacy documents, it does not <span style=\"font-weight: bold;\">specify<\/span> how to do anything with them.  Except at the level of a bit, the format does not express any structure or semantics.  Although you can express anything you want with 1,&#8217;s and 0&#8217;s, there is no common, interoperable use above the level of 1&#8217;s and 0&#8217;s provided for.  My binary document means something only to me, and unless I go outside of the standard and share additional information with you, you will not be able to understand my binary document.<\/p>\n<p>Interoperability comes not from <span style=\"font-weight: bold;\">representation<\/span>, but from <span style=\"font-weight: bold;\">specification<\/span>.<\/p>\n<p>(An aside \u2014 There is however speculation that it is possible to transmit information via a binary code in a way that presupposes no other prior agreement or knowledge other than universals like mathematical and physical laws.  It would require a bootstrapping approach where very basic elements of notation and mathematical logic are transmitted, followed by increasingly more complex concepts.  By this theory it would be possible to communicate with alien intelligences without any prior conventions.   See, for example, Carl Sagan&#8217;s novel, <cite>Contact<\/cite>.  But this is probably overkill for an office document format, unless your workplace is a lot stranger than mine.)<\/p>\n<p>So what is the difference between representing and specifying?  When you represent, it means  that you can map from the features of the legacy format to the the new format.   When you specify, it means that you provide the map, and enough detail so that others can read and write that same representation.  That is a big difference.<\/p>\n<p>Of course, OOXML is more than 1&#8217;s and 0&#8217;s.  But when you see attributes with names like, &#8220;useWord97LineBreakRules,&#8221; with <a href=\"https:\/\/2d823b65bb.nxcli.io\/blog\/2007\/01\/how-to-hire-guillaume-portes.html\">no additional specification<\/a>, then you know that the fix is in.  My guess is that MS Word has code someplace that looks like this:<\/p>\n<pre><br \/>if (useWord97LineBreakRules)<br \/>doCrappyOldWayOfLineBreaking(); \/\/ reuse legacy code from Word 97<br \/>else<br \/>doNewWayOfLineBreaking(); \/\/ Use new rules<br \/><\/pre>\n<p>If this is true, then MS Word can implement this feature trivially. But no one else can make sense of it, because we lack a specification of its behavior .  They might has well had called the attribute, &#8220;Fred.&#8221;  It is just as useful.<\/p>\n<p>Another example is how OOXML deals with  PowerPoint slide transitions, the things that people use in an attempt to make a boring presentation seem more interesting.  Microsoft has ensured that they can <span style=\"font-weight: bold;\">represent<\/span> all of the transitions.  They are all there listed in Section 4.4.1.46: blinds, checker, circle, comb, cover, cut, etc.  But when you drill down into the definitions, this is what you find:<\/p>\n<blockquote><p>wheel (Wheel Slide Transition)<\/p>\n<p>This element describes a wheel slide transition effect.<\/p>\n<p>[Example: Consider we have a slide with a wheel slide transition. The &lt;wheel&gt; element should be used as follows:<\/p>\n<p>&lt;p:transition&gt;<br \/>&lt;p:wheel\/&gt;<br \/>&lt;\/p:transition&gt;<br \/>End example]<\/p><\/blockquote>\n<p>That&#8217;s it.  Ditto for all of the other slide transitions.  Not exactly specified fully, is it?   Although the text claims that it &#8220;describes a wheel slide transition effect,&#8221; in truth it merely labels it.  There is no <span style=\"font-weight: bold;\">specification<\/span>, only <span style=\"font-weight: bold;\">representation<\/span>.  And that curious little example \u2014 is this some sort of joke?  Did someone really think that attributes with no definition are improved by trivial examples? It reminds me of the old spelling bee joke:<\/p>\n<blockquote><p>Judge: The word is &#8220;synecdoche.&#8221;<br \/>Student: Could you use that in a sentence?<br \/>Judge: Certainly.  &#8220;Synecdoche&#8221; is a very hard word to spell.<\/p><\/blockquote>\n<p>100% correct, but also 100% useless.  As I read through the OOXML specification I am finding hundreds of places like this where things are labeled, but no definition is given.<\/p>\n<p>So I think we need to ask more questions when we hear the claims that OOXML was designed to faithfully <span style=\"font-weight: bold;\">represent<\/span> 100% of the legacy documents.  We need to respond that <span style=\"font-weight: bold;\">representation <\/span>is not enough for an open format.  Even an XML format of just &lt;one&gt;&#8217;s and &lt;zero&gt;&#8217;s can do that.  To be of use to anyone other than Microsoft we need more than just <span style=\"font-weight: bold;\">representation<\/span>.  We need <span style=\"font-weight: bold;\">specification<\/span>, and we need the map to the legacy formats.   To accept anything else is to embark on a voyage with a foreign dictionary missing the definitions.  It can represent everything  that you want to say, but you&#8217;ll be unable to say any of it.<\/p>\n<p>ISO defines a standard as a:<\/p>\n<blockquote><p>&#8230;document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context<\/p><\/blockquote>\n<p>A key clause there is the requirement for providing, &#8220;common and repeated use.&#8221;  Providing explicit <span style=\"font-weight: bold;\">representation<\/span> for a single vendor&#8217;s legacy formats while not providing for common use of that ability, this is not the purpose of an ISO standard and to my eyes appears to be an abuse of the standardization process.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Maybe I just have an ear for this, but whenever I hear a number of people saying the same odd thing, using the same strained phrase, it catches my attention and makes me take a closer look. Individuals naturally have a great diversity of expression and phrasing, so where this is lacking, and the Borg [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[6],"tags":[],"class_list":{"0":"post-104","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-ooxml","7":"entry"},"_links":{"self":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/104","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/comments?post=104"}],"version-history":[{"count":0,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/104\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/media?parent=104"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/categories?post=104"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/tags?post=104"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}