“There is it some game in this wood?”
Pedro Carolino wanted to write and publish an Portuguese/English phrase book.
“Another time there was plenty some black beasts and thin game, but the poachers have killed almost all.”
But one small problem — Carolino did not know English.
“Look a hare who run! let do him to pursue for the hounds! it go one’s self in the plonghed land.”
Undeterred, Carolino hatched a clever plan.
“Here that it rouse. let aim it! let make fire him!”
He had a copy of an Portuguese/French phrasebook, O Novo guia da conversação em francês e português by José da Fonseca. And he had a French/English dictionary.
“I have put down killed.”
With these two resources, writing his phrase book would be easy. Or so he thought.
“Me, i have failed it; my gun have miss fixe.”
Starting from the French half of the text in da Fonseca’s book, Carolino dutifully used his dictionary to translate, word-for-word, the French into English.
The result, O Novo Guia da Conversação, em Português e Inglês, em Duas Partes was published in Paris in 1855, and is now considered to be a classic of unintentional humor.
“Here certainly a very good hunting.”
A similar problem occurs in DIS 29500 “Office Open XML”. The scope of OOXML, as amended by the BRM is stated as:
This International Standard defines a set of XML vocabularies for representing word-processing documents, spreadsheets and presentations. The goal of this standard is, on the one hand, to represent faithfully the existing corpus of word-processing documents, spreadsheets and presentations that have been produced by Microsoft Office applications (from Microsoft Office 97 to Microsoft Office 2008 inclusive). It also specifies requirements for Office Open XML consumers and producers , and on the other hand, to facilitate extensibility and interoperability by enabling implementations by multiple vendors and on multiple platforms.
Faithful representation of Microsoft Office 97-2008. I’ve learned it is rarely polite to ask a man what he means by “faithful”, but let me make an exception here. We have now the binary Office format specifications, not part of the standard, but posted by Microsoft. And we have OOXML specification. In what way does the OOXML “represent faithfully” the “existing corpus” of legacy documents?
Does OOXML tell you how to translate a binary document into OOXML? No. Does it tell you how to map the features of legacy documents in OOXML? No. Does it give an implementor any guidance whatsoever on how to “represent faithfully” legacy documents? No. So it is both odd and unsatisfactory that primary goal of the OOXML standard is so tenuously supported by its text.
Now, certainly, someone using the binary formats specifications, and using the OOXML specification, could string them together and attempt a translation, but the results will not be consistent or satisfactory. It is the Carolino Effect. Knowing the two endpoints is not the same as knowing how to correctly map between them. A faithful mapping requires knowledge not only of the two vocabularies, but also the interactions.
Also, having the two specifications does not help with the 77 features in OOXML which are declared to the “implementation-defined” or “application-defined”. How are these translated from the binary formats?
Note that DIS 29500 bears the obvious marks of its legacy roots, from the use of VML and non-hierarchical run structures in WordProcessingML, to bit fields and idiosyncratic leap year calculations in SpreadsheetML. This suggests the likelihood that the authors of this standard did not just sit down and design the standard from scratch, but that they in fact had access to the binary format specification and mapped it into XML as a preparatory step. It is difficult to explain the presence of elements such as “lineWrapLikeWord6” without positing the presence of such a mapping.
Microsoft should simply publish this mapping. Without such a mapping, conversions will be inconsistent, interoperability will suffer and a primary goal of the standard will not be met. Given the same binary document, Microsoft Office, Apple iWork, OpenOffice.org, etc., will all produce different OOXML documents. How is this “faithfully representing” existing documents? What is needed is a canonical mapping.
Note that the initiation of a open source project to develop a convertor between the binary formats and OOXML is insufficient. What is required is a canonical mapping. Otherwise we are faced with the reality that the true goal of OOXML is more accurately stated as:
To allow Microsoft the ability to represent their legacy documents in XML and pretend that it is a capability that other vendors can practice as well.
Though this issue was of great interest to several NB’s, it was not able to be raised at the BRM for lack of time.