Rob

Suggesting ODF Enhancements

2008/04/16 By Rob 10 Comments

There is a good post by Mathias Bauer on Sun Hamburg’s GullFOSS blog. He deals with the practical importance of OASIS’s “Feedback License” that governs any public feedback OASIS receives from non-TC members.

The ODF TC receives ideas for new features from many places. Many of the ideas come from our TC members themselves, where we have representation from most of the major ODF vendors, from open source projects, interest groups, as well as from individual contributors.

Other ideas come from other vendors or open source projects, from organizations that the TC has a liaison relationship with (like ISO/IEC JTC1/SC34), or individual members of the public.

Contributions from OASIS TC members are already covered by the OASIS IPR Policy. The TC member who contributes written proposals to the TC is obliged from the time of contribution. And other TC members are obliged if they have been TC members for at least 60 days and remain a member 7 days after approval of any Committee Draft. You can see the participation status of TC members here.

For everyone else, those who are not members of the ODF TC, the rules require that proposals, feedback, comments, ideas, etc., come through our comment mailing list. But before you can post to the comment list you must first accept the terms of the Feedback License.

Is this extra step annoying? Yes, it is. But this pain is what is necessary to keep our IP pedigree clean and protect the rights of everyone to implement and use ODF. It is part of the price we pay for open standards. Free does not mean free from vigilance.

One of my responsibilities on the ODF TC is to monitor and process the public comments we receive. Regretfully this is a duty which I’ve neglected for too long. So I spent some time this week getting caught up on the comments, entering them all into a tracking spreadsheet. We have a total of 180 public comments since ODF 1.0 was approved by OASIS, covering everything from new feature proposals to reports of typographical errors.

The largest single source of comments is from the Japanese JTC1/SC34 mirror committee, where they have been translating the ODF 1.0 standard into Japanese. As you know, you will get no closer reading of a text than when attempting translation, so we’re glad to receive this scrutiny. I’ll look forward to adding the Japanese translation of ODF along side the existing Russian and Chinese translations soon.

For comments that are in the nature of a defect report, i.e., reporting an editorial or technical error in the standard, we will include a fix in the ODF 1.0 errata document we are preparing. For comments that are in the nature of a new feature proposal, we will discuss on a TC call, and decide whether or not to include it in ODF 1.2.

A sample of some of the feature proposals from the comment list are:

A request to support embedded fonts in ODF documents
A request to support multiple versions of the same document in the same file
A request to allow vertical text justification
A proposal for enhanced string processing spreadsheet functions
A proposal for spreadsheet values to allow units, which would help prevent calculation errors due to mixing units, i.e., adding mm to kg would be flagged as an error.
A proposal for allowing spreadsheet named ranges to have namespaces, with each sheet in a workbook having its own namespace.
A proposal to allow a document to have a “portable” flag to allow it to self-identify that it contains only portable ODF content with no proprietary extensions.
Proposal for adding FFT support to spreadsheet
Proposal for adding overline text attribute

If you have any other ideas for ODF enhancements, or thoughts on the above proposals, please don’t post a response to this blog! Remember, you need to use the comment list for your feedback to be considered by the OASIS ODF TC.

Of course, general comments are always welcome on this blog.

New Paths in Standardization

2008/04/02 By Rob 20 Comments

The world should be pleased to note, that with the approval of ISO/IEC 29500, Microsoft’s Vector Markup Language (VML), after failing to be approved by the W3C in 1998 and after being neglected for the better part of a decade, is now also ISO-approved. Thus VML becomes the first and only standard that Microsoft Internet Explorer fully supports.

Congratulations are due to the Internet Explorer team for reaching this milestone!

Now that it has been demonstrated that pushing proprietary interfaces, protocols and formats through ISO is cheaper and faster than writing code to implement existing open standards, one assumes that the future is bright for more such boutique standards from Redmond. Open HTML, anyone?

Seeking Open Standards Activists

2008/03/25 By Rob 22 Comments

Some thoughts for Document Freedom Day 2008.

Back a few weeks ago in Geneva, OpenForum Europe hosted an evening of mini-talks and a discussion panel with various well-known personalities in our field: Vint Cerf, Bob Sutor, Andy Updegrove and Håkon Lie. I wasn’t able to comment on the event at the time, due to my self-imposed blog silence that week, but I’d like to take the opportunity today to carry forward one of the topics discussed then.

I’d like to take as my launching point the theme of Andy Updegrove’s talk, which was “Civil ICT Standards”. Andy treats this subject more fully on his blog, and also speaks to the topic in his taped interview with Groklaw’s Sean Daly.

Thus spake Updegrove:

But as the world becomes more interconnected, more virtual, and more dependent on ICT, public policy relating to ICT will become as important, if not more, than existing policies that relate to freedom of travel (often now being replaced by virtual experiences), freedom of speech (increasingly expressed on line), freedom of access (affordable broadband or otherwise), and freedom to create (open versus closed systems, the ability to create mashups under Creative Commons licenses, and so on.

This is where standards enter the picture, because standards are where policy and technology touch at the most intimate level.

Much as a constitution establishes and balances the basic rights of an individual in civil society, standards codify the points where proprietary technologies touch each other, and where the passage of information is negotiated.

In this way, standards can protect – or not – the rights of the individual to fully participate in the highly technical environment into which the world is now evolving. Among other rights, standards can guarantee:

That any citizen can use any product or service, proprietary or open, that she desires when interacting with her government.

That any citizen can use any product or service when interacting with any other citizen, and to exercise every civil right.

That any entrepreneur can have equal access to marketplace opportunities at the technical level, independent of the market power of existing incumbents.

That any person, advantaged or disadvantaged, and anywhere in the world, can have equal access to the Internet and the Web in the most available and inexpensive method possible.

That any owner of data can have the freedom to create, store, and move that data anywhere, any time, throughout her lifetime, without risk of capture, abandonment or loss due to dependence upon a single vendor.

Let us call these “Civil ICT Rights,” and pause a moment to ask: what will life be like in the future if Civil ICT Rights are not recognized and protected, as paper and other fixed media disappear, as information becomes available exclusively on line, and as history itself becomes hostage to technology?

This rings true to me. Technology, computer technology in particular, now permeates our lives. We interact with it daily, from the moment the internet-radio alarm clock goes off, until days end, when we check our email “one last time” before going to bed.

Similarly, the standards that define the interfaces between these devices are also of increasing importance. There was once a time when standards dealt only with the “infrastructure”, the stuff in the walls and under the panel floor, or in that funny little locked door off the hallway, with all the cables and flashing lights, where strange men with clipboards would occasionally emerge, accompanied by a poof of cold air and the buzzing of machines.

But today, the technology and the standards that mediate the technology are now directly in front of your face. Think MP3 players. Think DVD’s. Think DRM. Think cellular phones. Think web pages. Think encryption. Think privacy. Think documents. Think documents-privacy-security-DRM, your data and what you are allowed to do with it, and what others are allowed to do with it, and whether you control any bit of this in this mad world of ours.

Between you and the tasks that want to do today stands technology and the standards that mediate that technology. Standards are damn important.

Now, although the reach of technology and ICT standards has progressed over the years, the organizations and the processes that create these standards have not always kept up. In many cases standardization remains the creature of big industry with little or no consumer input. It is back-room discussions, where companies connive to see how many patents of their own portfolio they can encumber the standard with. A successful standard is one where no major company is left hungry. Consensus means everyone at the table has been fed. That is the traditional world of technology standards. It brings to mind the famous line from Adam Smith:

People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public, or in some contrivance to raise prices — The Wealth of Nations (I.x.c.27)

Luckily, there is some hope. The proponents of “open standards” seek standards based on principles of open participation, consensus decision making, non-profit stewardship, royalty-free IP, and free access to standards. The web itself, with the underlying network protocol stack, HTML family of formats with DOM and scripting API’s is a shining example of what open standards can accomplish. Tim Berners-Lee says it best, in his FAQ’s:

Q: Do you have had mixed emotions about “cashing in” on the Web?

A: Not really. It was simply that had the technology been proprietary, and in my total control, it would probably not have taken off. The decision to make the Web an open system was necessary for it to be universal. You can’t propose that something be a universal space and at the same time keep control of it.

But it is important to realize that “control” mechanisms in standards go well beyond IP and organization issues. There are other important factors at play, and we need to address these as well. Knut Blind discusses some of these issues a section called “Anti-Competitive Effects of Standards” from his The Economics of Standards (2004).

The negative impact of standards for competition are mostly caused by a biased endowment with resources available for the standardization process itself. Therefor, even when the consensus rule is applied, dominant large companies are able to manipulate the outcomes of the process, the specification of the standard, into a direction which leads to skewed distribution of benefits or costs in favor of their interests.

In other words, participation in standardization activities is time consuming and expensive, and large companies are much more able to make this kind of commitment than small companies, organizations or individuals. So ,large companies rule the world.

This is especially true with standardization at the international level, where decisions are often made at meetings in very expensive international locations. JTC1 is still discussing what technologies would be required to allow participation in meetings without travel. (Hint — its called a “telephone”) To put this in perspective, my week in Geneva cost $3687.52. I flew coach, ate most of my meals on the cheap, often just grabbing hors d’oeuvres at receptions, and I received negotiated IBM corporate rates for air and hotel. This is one JTC1 meeting. What if I wanted to be really active? Add in two SC34 Plenary meeting (Norway/Kyoto). Add in JTC1 Plenary meetings. Add in US NB meetings. Add in US NB membership fees, consortium fees, conferences, etc. This starts adding up, around $40,000/year to participate actively in tech standards, and this doesn’t include the cost of my time.

How many small companies are going to pay this amount? How many non-profit organizations? How many individuals? Not many.

But in spite of the expense, in spite of the large company bias of the international standardization system, I saw reason for hope at the Geneva BRM. I saw younger participants, with fire in their bellies. I saw FOSS supports from developing countries. I saw Linux on laptops. I saw participants from FOSSFA, SIUG, EFFI, ODF Alliance Brazil, COSS, etc. They joined their NB’s, participated in their NB debates and were appointed to represent their countries in the BRM.

Sure, it is only a foot in the door. One in five BRM participants were Microsoft employees. But it was a hopeful sign. We’ve planted the seed. We must plant more. And we must see that they grow.

Strength in standards participation comes with time, with participation, with networking, with learning the rules (written and unwritten) learning from others, etc. Just as we have FOSS experts in the software engineering, in law, in business, in training/education, we also need experts in standardization. Certainly the bread and butter participation will be from individual engineers, participating for the duration of a particular proposal or group of proposals. But we also need the institutional linchpin participants, those who have taken on leadership positions within standards organizations, and whose influence is broad and deep.

FOSS also needs a standards agenda. In a world of patent encumbered standards controlling the central networks, open source software dies, and dies quickly. We must protect and grow the open standards, for without them we cease to exist.

What standards are important? Which demand FOSS representation? Remember just a few weeks ago, when there was a lot of concern about how the DIS 29500 BRM added explicit mention of the patent-encumbered MP3 standard, but failed to mention Ogg Vorbis at all? Although I sympathize with this concern, the fact is the BRM could not have added Ogg Vorbis at all, because it is not a standard. Are we willing to do more than lament about this? I tell you that if Ogg Vorbis had been an ISO standard it would have been explicitly added to OOXML at the BRM. Are we willing to do something about it?

What are the standards critical to FOSS, and what are we doing about it? What standards, existing or potential, should we be focusing on? I suggest the following for a start:

Ogg Vorbis
Ogg Theora
PNG, ISO/IEC 15948
ODF, ISO/IEC 26300
PDF, ISO 3200
Linux Standard Base (LSB), ISO/IEC 23360
Most of the W3C Recommendations
Most of the IETF RFC’s

I’m sure you can suggest many others.

Let’s put it all together. Some ICT standards directly impact what we can do with our data and our digital lives. These are the Civil ICT Standards. We need to ensure that these standards remain open standards, so anyone can implement them freely. However, the standardization system, both at the national and international levels is biased in favor of those large corporations best able to afford dedicated staff to work within those organizations and develop personal effectiveness and influence in the process. Showing up once a year is not going to work. If FOSS is going to maintain any level of influence in formal standardization world, especially at the high-stakes international level, it needs to find a way to identify, nurture and support participation of “Open Standards Activists”. The GNOME Foundation’s joining of Ecma, or KDE’s membership in OASIS are examples how this could work. Umbrella organizations like Digistan also are critical and can be a nucleus for standards activists. But what about taking this to the next level, to NB membership? Another example is the Linux Foundation’s Travel Fund, designed to sponsor attendance of FOSS developers at technical conferences. Imagine what could be done with a similar fund for attendance at standards meetings?

So that is my challenge to you on this first Document Freedom Day. We’re near the end of what promises to be one of many battles. The virtual networks of the future are just as lucrative as the railroad and telephone networks of the last century were. These include the network of compatible audio formats, or the network of IM users using a compatible protocol, or the network of users using a single open document format. If FOSS projects and organizations want to secure the value for their users that comes from being part of these networks, then FOSS projects must encourage the use of open standards, and must also encourage and nurture new talent for the next generation of open standards activists.

I’m looking forward to the day, soon, when I can search Google for “open standards activist” and not find a paid Microsoft shill among the listings on the first page.

OOXML’s (Out of) Control Characters

2008/03/24 By Rob 14 Comments

Let’s start with the concepts of “lexical” and “value” spaces in XML, as well as the mechanism of “derivation by restriction” in XML Schema. Any engineer can understand the basics here, even if you don’t eat and drink XML for breakfast.

The value space for an XML data item comprises the set of all allowed values. So the value space for the “float” data type would be all floating point numbers, such as 12.34 or 43.21. The lexical space comprises all ways of expressing these values in the character stream of an XML document. So lexical representations of the value 12.34 include “12.34”, “12.340” and ‘1.234E1”. For ease of illustration I will indicate value space items in bold, and lexical space items in quotes. In general there are multiple lexical representations that may represent the same value.

Character data in XML also permits more than one lexical representation of the same value. For example, “A” and “A” both represent the value A. The “numerical character reference” approach allows an XML author to easily encode the occasional Unicode character which is not part of the author’s native editing environment, e.g., adding the copyright character or occasional foreign character. The value space allowed by XML includes most of Unicode, including all of the major writing systems of the world, current and historical.

The concern I have with DIS 29500 concerns Ecma’s introduction of a ST_XString (Escaped String) datatype. This new type is defined via the following XML Schema definition:

This uses the “derivation by restriction” facility of XML Schema to define a new type, derived from the standard xsd:string schema type. The xsd:string type is defined to allow only character values that are also allowed in the XML standard.

The use of derivation by restriction implies a clear relationship between the ST_Xstring type and the base type xsd:string. This is stated in XML Schema Part 1, clause 2.2.1.1:

A type definition whose declarations or facets are in a one-to-one relation with those of another specified type definition, with each in turn restricting the possibilities of the one it corresponds to, is said to be a restriction.

The specific restrictions might include narrowed ranges or reduced alternatives. Members of a type, A, whose definition is a restriction of the definition of another type, B, are always members of type B as well.

The latest sentence can be taken as a restatement of the Liskov Substitution Principle, a fundamental principle of interface design, that a subtype should be usable (substitutable) wherever a base type is usable. It is this principle that ensures interoperability. A type derived by restriction limits, restricts, constrains, reduces the permitted value space of its base type, but it cannot increase the value space beyond that permitted by its base type.

So, with that background, let’s now look at how OOXML defines the semantics of its ST_Xstring type:

ST_Xstring (Escaped String)

String of characters with support for escaped invalid-XML characters.

For all characters which cannot be represented in XML as defined by the XML 1.0 specification, the characters are escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character’s value. [Example: The Unicode character 8 is invalid in an XML 1.0 document, so it shall be escaped as _x0008_. end example]

This simple type’s contents are a restriction of the XML Schema string datatype.

In other words, although ST_Xstring is declared to be a restriction of xsd:string it is, via a proprietary escape notation, in fact expanding the semantics of xsd:string to create a value space that includes additional characters, including characters that are invalid in XML.

Let’s review some of the problems it introduces.

First, the semantics of XML strings that contain invalid XML-characters is undefined by this or any other standard. For example, OOXML uses ST_Xstring in Part 4, Clause 3.3.1.30 to store the error message which should be displayed when a data validation formula fails. But what should an OOXML-supporting application do when given a display string which contains control characters from the C0 control range, characters forbidden in XML 1.0?

U+0004 END OF TRANSMISSION
U+0006 ACKNOWLEDGE
U+0007 BELL
U+0008 BACKSPACE
U+0017 SYNCHRONOUS IDLE

How should these characters be displayed?

There is a reason XML excludes these dumb terminal control codes. They are neither desired nor necessary in XML.

Elliotte Rusty Harold explains the rationale for this prohibition in his book Effective XML:

The first 32 Unicode characters with code points 0 to 31 are known as the C0 controls. They were originally defined in ASCII to control teletypes and other monospace dumb terminals. Aside from the tab, carriage return, and line feed they have no obvious meaning in text. Since XML is text, it does not include binary characters such as NULL (#x00), BEL (#x07), DC1 (#x11) through DC4 (#x14), and so forth. These noncharacters are historic relics. XML 1.0 does not allow them.

This is a good thing. Although dumb terminals and binary-hostile gateways are far less common today than they were twenty years ago, they are still used, and passing these characters through equipment that expects to see plain text can have nasty consequences, including disabling the screen.

Further, since these characters are undefined in XML, they are unlikely to work well with existing accessibility interfaces and devices. At best these characters will be ignored and introduce subtle errors. For example, what does “$10,[BS]000” become if one system processes the backspace and another does not? Worst case, the accessibility interface expecting a certain range of characters as defined by the xsd:string type will crash when presented with values beyond the expected range.

Interfaces with existing programming languages are also harmed by ST_Xstring. How does a C or C++ XML parser deal with XML that now can allow a U+0000 (NULL) character in the middle of a string, something which is illegal in that programming language?

What about XML database interfaces that take XML data and store it in relational tables? If they are schema-aware and see that ST_Xstring is merely a restriction of xsd:string, they will assume the normal range of characters can be stored wherever an xsd:string can be stored. But since the value space is expanded, there is no guarantee that this will still be true. These characters may cause validation errors in the database.

By now, the observant reader may be accusing me of pulling a fast one. “But Rob, none of the above is a problem if the application simply leaves the ST_Xstring encoded and does not try to decode or interpret the non-XML character,” you might say.

OK. Fair enough. Let’s follow that approach and see where it leads us.

Let’s look at interoperability with other XML-based standards. Imagine you do a DOM parse of an OOXML document that contains “strings” of type ST_Xstring. Either your parser/application is OOXML-aware, or it isn’t. In other words, either it is able to interpret the non-standard _xHHHH_ instructions, or it isn’t.

If it doesn’t understand them, then any other code that operates on the DOM nodes with ST_Xstring data is at risk of returning the wrong answer. For example, what is the length of the string “ABC”? Three-characters, of course. But what is the length of the string “_x0041_BC” ? These two strings both have the same values according to OOXML. But an XML application might return 9 or return 3, depending on whether it is OOXML-aware or not. Since most (all) XML parsers are unaware of the non-standard escape mechanism proposed by OOXML, they will typically calculate things such as string lengths, string comparisons, string sorting, etc., incorrectly.

But suppose the parser/application is OOXML-aware and correctly decodes these character references into the correct Unicode values, then what? Assuming the host language doesn’t crash from the existence of this control characters, we then are presented with problems at the interface with any other code that operates on the DOM. Suppose we try to transform the DOM via XSLT to XHTML. Will the XSLT engine properly handle the existence of these forbidden character values? The XSLT engine may just crash. But suppose it doesn’t. How does it write out these control characters into XHTML? It can’t. These values are not permitted in XHTML. Dead end. What about DocBook? DITA? OpenDocument Format? Not possible. Since these characters are not permitted in XML 1.0 at all, they will be forbidden in all other markup languages that are based on XML 1.0, or even XML 1.1 for that matter (XML 1.1 allows some but not all of these characters, in particular the NULL character is excluded).

Note further that with XML pipelining and with mashups, the application that writes XML output typically does not have direct knowledge of the application that originally produced the XML values. This decoupling of producers and consumers is an essential aspect of modern systems integration, include Web Services. By corrupting XML string values in the way that it does, DIS 29500 breaks the ability to have loosely coupled systems. Once the value space is polluted by these aberrant control characters, every application, every process that touches this data must be aware of their non-standard idiosyncrasies lest they crash or return incorrect answers. In this way, one standard perverts the entire XML universe, forcing them all to contend with the poor hygiene of a single vendor.

The reader might think that I exaggerate the importance of this, that surely ST_Xstring is only used in OOXML in edge cases, in rare, compatibility modes. We wish that this were true. However, a look at the DIS 29500 shows that ST_Xstring is pervasive, and in fact is the predominant data type in SpreadsheetML, used to express the vast majority of spreadsheet content, including cell contents, headers, footers, displays strings, error strings, tooltip help, range names, etc. Any application that operates on an OOXML spreadsheet will need to deal with this mess.

For example, here are some uses of ST_Xstring in DIS 29500, Part 4:

Clause 3.2.3 for the name of a custom view in a spreadsheet
Clause 3.2.5 for the name of a spreadsheet named range, for the descriptive comment, for the name description, for the
help topic, the keyboard shortcut, the status bar text and for the menu item text
Clause 3.2.14 for the name of a spreadsheet function group
Clause 3.2.19 for the name of a sheet in a workbook
Clause 3.2.22 for the name of a smart tag as well as for the URL of a smart tag.
Clause 3.2.25 for the destination file name and title when publishing spreadsheet to the web.
Clause 3.3.1.10 for the value of a conditional formatting object, e.g., a gradient
Clause 3.3.1.20 for the name of a custom property
Clause 3.3.1.28 for sheet and range names
Clause 3.3.1.30 for error message string, error message title, prompt string and prompt title in a spreadsheet data validation definition.
Clause 3.3.1.35 for the value of a footer for even numbered pages.
Clause 3.3.1.36 for the value of a header for even numbered pages.
Clause 3.3.1.38 for the content of the first page footer
Clause 3.3.1.39 for the content of the first page header
Clause 3.3.1.44 for the display string for a hyperlink, the tooltip help for the link, also the anchor target if the hyperlink is to an HTML page
Clause 3.3.1.49 for values of input cells in a scenario
Clause 3.3.1.50 for cell inline text values
Clause 3.3.1.55 for the value of a footer for odd numbered pages.
Clause 3.3.1.56 for the value of a header for odd numbered pages.
Clause 3.3.1.73, in scenarios for the comment text, the scenario name and the name of the person who last changed the scenario.
Clause 3.3.1.88 when defining sort condition, for the values of a the custom sort list
Clause 3.3.1.93 for the value contained within a cell
Clause 3.3.1.94 for information associated with items published to the web, including the destination file and the title of the output HTML file
Clause 3.3.2.2 for expressing the criteria values in a filter
Clause 3.3.15 for the key/values for smart tag properties
Clause 3.4.4 for expressing the contents of a rich text run
Clause 3.4.5 for expressing the name of a font
Clause 3.4.6 for expressing the text of a phonetic hint for East Asian text
Clause 3.4.8 for expressing a text item in the shared string table
Clause 3.4.12 for the text content shown as part of a string
Clause 3.5.1.2 for a table, expressing a textual comment, a display name as well as style names.
Clause 3.5.1.3 for a table column, expressing cell and row style names, column name
Clause 3.5.1.7 for column properties created from an XML mapping, for expressing the associated XPath.
Clause 3.5.2.4 for the XPath associated with column properties for XML tables
Clause 3.7.1-3.7.6 for specifying content of tracked comments, including the text of the comments as well as the authors of the comments
Clause 3.8.29 expressing the name of a font

There are hundreds of additional uses. A search of DIS 29500 Part 4 for “ST_Xstring” returns 467 hits. OOXML also defines two additional types, “lptsr” (7.4.2.8) and “bstr” (7.4.2.4) that have the same flaw as ST_Xstring.

The reader might further argue that, although the type allows characters that are forbidden by XML, the actual occurrence of these values in real legacy documents is likely to be rare. This might be true, but this is cause for even greater concern. If every document contained these control characters, then we would immediately be aware of any interoperability problems when integrating OOXML data with other systems. But if these characters are permitted, but occur rarely and randomly, then the integration errors will also occur rarely and randomly, allowing data corruption and other problems to occur and propagate further before detection.

In summary, we are concerned that the ST_Xstring type in OOXML opens us up to problems such as:

Introducing accessibility problems
Breaking unaware C/C++ XML parsers
Breaking XML databases
Breaking interoperability with other XML languages
Breaking application logic related to string searching, sorting, comparisons, etc.
Introducing errors that will be hard to detect and resolve

Possible remedies include:

Use xsd:string uniformly instead of ST_Xstring, with no use of forbidden XML characters. This would require that applications that read legacy binary documents containing such characters eliminate them at this point, perhaps replacing them with licit characters or with whitespace. No application will be more able to devise the original meaning and intent of these characters than the original vendor. So they should be responsible for cleaning up these strings to make them XML-ready.
Use a non-string type such as the binary xsd:hexBinary or xsd:base64Binary to represent these data items.
Use a mixed content encoding, where the licit characters are represented by xsd:string data, and the forbidden characters are denoted by specially-defined elements. So “A_x0008_BC” would become: <text>A<backspace/>BC </text>. In this case the semantics of the <backspace> element would need to be documented in the DIS 29500 specification, including its effect on searching, sorting, length calculations, etc.

Five (Bad) Reasons to Approve OOXML

2008/03/24 By Rob 7 Comments

If you don’t approve OOXML, Microsoft will walk away, and you’ll never hear from them again. Forget the fact that OOXML is already an Ecma standard (Ecma-376), and cannot be taken away. Forget the fact that Microsoft has other formats lined up for ISO approval in the near future, like XPS or HD Photo. Microsoft wants you to think that if you don’t give them exactly what they want, now, they will walk away from ISO and you will be the worse from it. We need to encourage Microsoft for their abuse of the standardization process, in hopes that their participation will evolve in line with our hopes, and not our fears, that they will improve on the standardization side, while curbing the abuse side. Of course, the encouragement could be misinterpreted to mean the opposite, and we could get more abuse, and even lower quality standards. I guess that’s the risk we’ll just need to take. By similar abuses of logic small children hold their breath until their faces turn blue, thinking they can scare adults into giving them what they want. It doesn’t work there either.
If you approve OOXML, you can have the privilege of spending the next 5 years in the glorious work of fixing thousands of defects in the text. You can get a seat at the table, fixing bugs that should have been fixed in Ecma before OOXML was even submitted to JTC1. Forget the fact that maintenance in JTC1 is a ponderous, time consuming activity, where individual defects are enumerated, changes proposed, discussed, voted on, etc. Forget the fact that the recent BRM showed that you can’t really get through more than 60 defects in a week-long meeting. Forget the fact that fixing defects in Ecma, not JTC1, would be far faster and easier due to the lighter-weight process Ecma imposes on their TC’s. Forget that Fast Track is intended for mature, adopted standards not for ones that will require a “Perpetual BRM”. Forget all that. You want a seat at the bug fixing table? You got it.
Billions and Billions of legacy documents. Well, actually these legacy documents are not in OOXML format; they are in the legacy binary format. And no mapping has been provided from the legacy formats to OOXML. But there are billions and billions of these legacy documents. That must be important. So vote Yes for OOXML because there are billions and billions of documents in some other format that is nebulously related to it.
More standards are better. More standards means more choice, means more decisions, means more consultants, means more money paid to XML experts. You’ll sooner find the American Dairy Council recommending less milk consumption than a standards professional calling for fewer standards. So ignore quality, maturity and need. More standards are a good thing. Like Blue-ray and HD DVD.
ODF will be better if OOXML is approved. In OASIS we’re too stupid to look up legacy features or Excel spreadsheet formulas in Ecma-376. We would have never thought of that. We believe the only way to make ODF better is to make it more like OOXML. That is why we would like to encourage nice little JTC1 countries like Kazakhstan to vote YES for OOXML. As soon as OOXML is approved, then magically, it becomes useful to us. But the exactly same text, not approved by Kazakhstan and JTC1, is not useful to us at all. It is all or nothing. There is nothing in the middle. Rather than taking a useful, high quality text, and approving it on its merits, we are asked to approve a specification with thousands of defects, and by our approval we transform it into something useful to ODF.