Tuesday, April 10, 2007
The Case for a Single Document Format: Part III
This is Part III of a four-part post.
In Part I we surveyed of a number of different problem domains, some that resulted in a single standard, some that resulted in multiple standards.
In Part II, we described the forces that tend to unify or divide standards and showed in particular how network effects can drive the adoption of a single standard.
In this Part III we'll look at the document formats in particular, how we got to the present point, and how and why historically there has been but a single universally-accepted document format.
In Part IV, we'll tie it all together and show why there should be, and will be, only a single open digital document format.
"Don't be impatient, Comrade Engineer; We've come very far, very fast", in the words of Yevgraf Zhivago, Alec Guinness's character in Doctor Zhivago. Let's flash back 10 years ago and remind ourselves how we worked them...
It is 9:55 on an average Tuesday morning. I'm late (as usual) preparing for a meeting. With 5-minutes to go, I print out the agenda and handouts to the laser printer down the hall. It has printed by the time I arrive, and I sort through the three or four other print jobs to find the one that is mine. I need twelve copies for the meeting, so I join the queue at the photocopier, with everyone else who also waited to the last minute to print out the materials for their meetings. It is the standard last-minute, pre-meeting shuffle that we all do. I expect that an examination of statistics on IBM's photocopiers shows a spike 5-minutes before every hour. I head over to the conference room and start the meeting. At the end of the call, 80% of the printed materials will be discarded, hopefully into the recycling bin. This was the nature of collaboration in a modern, global company, circa 1995.
What has changed? Why did it change? What does this mean for document formats?
I'll start with the following excerpt from the 1930 Federal Census returns for Abington, Massachusetts, showing my grandmother, Florence Mae Cushing, then age 18, and her parents William and Mary, and household. The columns indicate the following:

The thing that caught by eye about this record is that it lists a, "Damon, Mary K" as William's mother-in-law, widowed, age 73, living with them. Let's see what we can find out about this woman. First step is to find her maiden name. A search for her marriage record in Abington failed, so we tried for Mary E. Damon's birth record, which we did find in Abington's birth register for in 1887 revealing her mother's maiden name as, "Chessman":

This then allows us to find Mary K. Chessman's birth record, also in Abington, from 1856 listing her parents as Edward and Emily:

And then from here we can go back and find the family in the 1860 Federal Census:

We see the family as owning $500 in real estate and $100 in personal property, having 5 children, the oldest 8 years old. Mary K. is only 3.
But when I skip ahead to the 1870 Census, something is clearly wrong:

As you can see above, Emily is listed as head of household, and there is no Edward. And where is our Mary K? Age age 13, she has moved out and is working as a "domestic servant" with a family of factory workers. Her sister Harriet, age 15, is also living there and working in an "eyelet factory":

So what happened? Resolving this mystery required a bit more sleuthing, but I eventually found the answer in a response to a records request to the National Archives and Records Administration (NARA):

From this I learned that Edward Blanchard Chessman, Mary K's father, had served in the Civil War with the Massachusetts 32nd Volunteers and had died of disease in 1863 at a military hospital in Alexandria, Virginia. This along, with a dozen pages of additional documents from NARA, detailed the pension application of his widow, the depositions of witnesses who vouched for their marriage and his service, the periodic requests for pension increases, all the way to 1903 when Emily died and her pension file was closed, marked "DEAD" with a big, bold stamp.
Since I was now tipped off to the value of pension records, I next searched for Edward's grandfather, Ziba Chessman, who I knew had served in the Revolutionary War. I was able to locate his widow's pension application as well:

The hand of this writer is not so easy to read, but I'd transcribe the start of it as:
I am in awe that these records have been maintained and preserved for so long, and made available to people like me who are researching their family tree. There is a continuity of records in New England that goes back almost 400 years. Birth, education records, draft registration, military service, marriage, court appearances and eventually death and burial. Whenever your personal life crossed paths with the government, it generated a record and this record may last forever, and more importantly, once the physical preservation aspects are taken care of, these records can be read forever.
Since around 1450, with Gutenberg's first notable success of combining document production and automation, and even before (and since) with manual document production, there has been a single globally relevant interoperable document format — ink on paper. Everyone could create it, everyone could read it, everyone could exchange it. It worked then and it works now.
Some noticeable advances in documents since 1450 include the invention of pre-printed forms, around 1850. These seem obvious now, but for many years we had what were called "formulary documents" which had boilerplate text which the clerk wrote out in full for each document, in addition to the customized language for each specific instance. You can get a sense of this from Ziba Chessman's pension application quoted earlier. From an engineering perspective you can think of this as reuse of design, but not implementation.
Having a pre-printed form was a step forward in productivity, allowing a greater degree of reuse. The Surgeon General's form shown above is an early example. Such forms were quickly associated with bureaucracy . In fact, the first written use of the word "form" in the English language (according to the Oxford English Dictionary) was this critical view of a 19th century government office:
The telegraph (1837) and teletype (1910) gave new, faster ways of moving documents around. Was Morse Code a new document format? Although the telegraph operators may have worked in Morse Code, the author of the document, and the person who ultimately received and read the document still worked with ink on paper.
The typewriter (1872) increase the speed and uniformity of personal document production. This also lead to a new use for carbon paper, an invention of 1806 originally created as an aid for the blind.
In the late 1880's, Edison's "Autographic Printing" was commercialized as the Mimeograph, giving a cheaper method of small batch document production.
Melvin Dewey (of Dewey Decimal fame) invents the hanging file folder (1893), leading to increased efficiency of document storage and retrieval.
The Harris Automatic Press Company is incorporated in 1895, ushering in the commercial use of offset printing and a 10-fold increase in document output rates.
The invention of the Soundex algorithm by Robert Russell of Pittsburgh in 1918 allowed more efficient searching of files and cards indexed by surnames, by grouping together names which were phonetically similar.
In 1924 radio facsimile allows pictures, as well as text, to be transmitted long distances.
In 1948 Xerography gave us document duplication without the use of wet, messy chemicals.
In 1969, IBM's Charles Goldfarb, Ed Mosher and Ray Lorie invented GML, the Generalized Markup Language, the ancestor of SGML, HTML and XML.
The 1970's saw the rise of the first computer-based word processors, including Wang's Office Information System.
In 1974 Xerox PARC engineers create Bravo, the first WYSIWYG word processor.
In 1975, with the rise of office automation systems and early word processors, Business Week boldly proclaimed the "Paperless Office".
At this point we reach an important fork in the road of history. What role would the computer and office automation mean for the future of documents? Does the paperless office become a reality? Or do we remain with paper-based documents? As Xerox PARC engineers were developing the world's first WYSIWYG word processor, at the same time they were also developing a system for transporting documents electronically, from one computer to another. But this innovation was dropped because it went against Xerox's core business, the creation and duplication of paper documents. So the choice was made. Paper still ruled. Paper consumption went up, not down. The word processor made it easier to produce more paper, faster. The paperless office did not happen, at least not yet. More first-hand details on this fascinating topic can be read in Sellen & Harper's The Myth of the Paperless Office
. In their words, "...paper became a surrogate for the network, enabling users with different machines to share documents...".
And so we continued, for another 20 years, of WYSIWYG word processors, WordStar, MacWrite, Writing Assistant, Manuscript, WordPerfect, Word, WordPro, etc. We all created documents and hid the files away on our hard-drives in incompatible formats. When we needed to work with others we usually just printed out the document and exchanged the printout, using the 500-year old format of ink on paper.
Let's pause here and make some observations.
First, note the areas of sustained and recurring innovation. These have been consistent throughout the past 500 years and reflect the ongoing nature and practical concerns of business communications:
But of course, we don't work this way anymore. Something changed, very recently. I don't print out agendas any more. I send them via email. I don't print out reports and review them with a red pen in hand. I mark them up electronically. In fact, unless I need to sign it or staple a receipt to it, I don't print out anything. I think I can live out the remainder of my professional career on only 2 reams of paper.
What happened then to change this? Why is there less of an emphasis on printed output today? What does this mean for WYSIWYG? And what does this mean for document formats?
These questions and others when I finish up this series in Part IV.
20 April 2007 — Another editing pass, tightening up the language, but still too long. Added link to "The Myth of the Paperless Office".
In Part I we surveyed of a number of different problem domains, some that resulted in a single standard, some that resulted in multiple standards.
In Part II, we described the forces that tend to unify or divide standards and showed in particular how network effects can drive the adoption of a single standard.
In this Part III we'll look at the document formats in particular, how we got to the present point, and how and why historically there has been but a single universally-accepted document format.
In Part IV, we'll tie it all together and show why there should be, and will be, only a single open digital document format.
The Meeting
It is 9:55 on an average Tuesday morning. I'm late (as usual) preparing for a meeting. With 5-minutes to go, I send out an updated meeting invite, with an updated agenda and a URL for the web conference. I also send out another email with an updated presentation attachment. It is the standard last-minute, pre-meeting shuffle that we all do. I expect that an examination of traffic statistics on IBM's email servers shows a spike 5-minutes before every hour, as we all send out last-minute meeting updates. I login to my web conference and dial into the call. I'll be meeting with my teammates, some in Westford, some in Raleigh, some in Portsmouth, some in Lexington, some in Dublin and some in Shanghai, a far-flung group. I've worked with some of these guys for years but still have never met most of them face-to-face. This is the nature of collaboration in a modern, global company. The call starts and I take a deep breath, push off my slippers and stretch my toes. Yes, I'm leading this meeting from home today."Don't be impatient, Comrade Engineer; We've come very far, very fast", in the words of Yevgraf Zhivago, Alec Guinness's character in Doctor Zhivago. Let's flash back 10 years ago and remind ourselves how we worked them...
It is 9:55 on an average Tuesday morning. I'm late (as usual) preparing for a meeting. With 5-minutes to go, I print out the agenda and handouts to the laser printer down the hall. It has printed by the time I arrive, and I sort through the three or four other print jobs to find the one that is mine. I need twelve copies for the meeting, so I join the queue at the photocopier, with everyone else who also waited to the last minute to print out the materials for their meetings. It is the standard last-minute, pre-meeting shuffle that we all do. I expect that an examination of statistics on IBM's photocopiers shows a spike 5-minutes before every hour. I head over to the conference room and start the meeting. At the end of the call, 80% of the printed materials will be discarded, hopefully into the recycling bin. This was the nature of collaboration in a modern, global company, circa 1995.
What has changed? Why did it change? What does this mean for document formats?
My family in documents
Let me take you on a detour, back in time, to tell a 200-year family story, illustrated with official documents of the period.I'll start with the following excerpt from the 1930 Federal Census returns for Abington, Massachusetts, showing my grandmother, Florence Mae Cushing, then age 18, and her parents William and Mary, and household. The columns indicate the following:
- Name
- Relationship to the head of household
- Whether they own or rent their dwelling
- Value of their dwelling
- Whether they own a radio
- Whether they own a farm
- Sex
- Race
- Age
- Marital condition
- Age at first marriage
- Whether they are in school

The thing that caught by eye about this record is that it lists a, "Damon, Mary K" as William's mother-in-law, widowed, age 73, living with them. Let's see what we can find out about this woman. First step is to find her maiden name. A search for her marriage record in Abington failed, so we tried for Mary E. Damon's birth record, which we did find in Abington's birth register for in 1887 revealing her mother's maiden name as, "Chessman":

This then allows us to find Mary K. Chessman's birth record, also in Abington, from 1856 listing her parents as Edward and Emily:

And then from here we can go back and find the family in the 1860 Federal Census:

We see the family as owning $500 in real estate and $100 in personal property, having 5 children, the oldest 8 years old. Mary K. is only 3.
But when I skip ahead to the 1870 Census, something is clearly wrong:

As you can see above, Emily is listed as head of household, and there is no Edward. And where is our Mary K? Age age 13, she has moved out and is working as a "domestic servant" with a family of factory workers. Her sister Harriet, age 15, is also living there and working in an "eyelet factory":

So what happened? Resolving this mystery required a bit more sleuthing, but I eventually found the answer in a response to a records request to the National Archives and Records Administration (NARA):

From this I learned that Edward Blanchard Chessman, Mary K's father, had served in the Civil War with the Massachusetts 32nd Volunteers and had died of disease in 1863 at a military hospital in Alexandria, Virginia. This along, with a dozen pages of additional documents from NARA, detailed the pension application of his widow, the depositions of witnesses who vouched for their marriage and his service, the periodic requests for pension increases, all the way to 1903 when Emily died and her pension file was closed, marked "DEAD" with a big, bold stamp.
Since I was now tipped off to the value of pension records, I next searched for Edward's grandfather, Ziba Chessman, who I knew had served in the Revolutionary War. I was able to locate his widow's pension application as well:

The hand of this writer is not so easy to read, but I'd transcribe the start of it as:
Commonwealth of Massachusetts. Norfolk County. On this twenty second day of August 1838 personally appeared before Herman **** The *** of Probate in **** County, Mehitable Chessman a resident in the Town of Braintree in the County of Norfolk and state of Massachusetts aged seventy three years, who being first duly sworn according to law doth on her oath make the following declaration in order to obtain the benefit of the provision made by the Act of Congress passed July 7th 1838 entitled "An Act Granting Half Pay and Pensions to Certain Widows", that she is the widow of Ziba Chessman late of Braintree in the County of Norfolk and state aforementioned deceased, who was a Solider in the War of the Revolution; that her said husband Ziba Chessman enlisted into Captain Isaac Thayers or Captain Nathaniel Belchers Company in the year 1775 and served a short period of time as a private with the Massachusetts Militia, around the shores of Boston, according to the best of her knowledge....
I am in awe that these records have been maintained and preserved for so long, and made available to people like me who are researching their family tree. There is a continuity of records in New England that goes back almost 400 years. Birth, education records, draft registration, military service, marriage, court appearances and eventually death and burial. Whenever your personal life crossed paths with the government, it generated a record and this record may last forever, and more importantly, once the physical preservation aspects are taken care of, these records can be read forever.
A brief history of document technology
It is somewhat odd that we've been debating document formats for so long and have not really said what they are. I'll recommend the following for our discussion:A document format consists of the conventions that allow a document to be fixed in a persistent state and then exchanged with other parties who are able to use these same conventions to read and further edit that document. If you and I understand the same document format, then you and I can exchange documents in that format and we can collaborate using that format.
Since around 1450, with Gutenberg's first notable success of combining document production and automation, and even before (and since) with manual document production, there has been a single globally relevant interoperable document format — ink on paper. Everyone could create it, everyone could read it, everyone could exchange it. It worked then and it works now.
Some noticeable advances in documents since 1450 include the invention of pre-printed forms, around 1850. These seem obvious now, but for many years we had what were called "formulary documents" which had boilerplate text which the clerk wrote out in full for each document, in addition to the customized language for each specific instance. You can get a sense of this from Ziba Chessman's pension application quoted earlier. From an engineering perspective you can think of this as reuse of design, but not implementation.
Having a pre-printed form was a step forward in productivity, allowing a greater degree of reuse. The Surgeon General's form shown above is an early example. Such forms were quickly associated with bureaucracy . In fact, the first written use of the word "form" in the English language (according to the Oxford English Dictionary) was this critical view of a 19th century government office:
The waiting-rooms of that Department soon began to be familiar with his presence, and he was generally ushered into them by its janitors much as a pickpocket might be shown into a police-office; the principal difference being that the object of the latter class of public business is to keep the pickpocket, while the Circumlocution object was to get rid of Clennam. However, he was resolved to stick to the Great Department; and so the work of form-filling, corresponding, minuting, memorandum-making, signing, counter-signing, counter-counter-signing, referring backwards and forwards, and referring sideways, crosswise, and zig-zag, recommenced — Dickens, Little Dorrit (1855)
The telegraph (1837) and teletype (1910) gave new, faster ways of moving documents around. Was Morse Code a new document format? Although the telegraph operators may have worked in Morse Code, the author of the document, and the person who ultimately received and read the document still worked with ink on paper.
The typewriter (1872) increase the speed and uniformity of personal document production. This also lead to a new use for carbon paper, an invention of 1806 originally created as an aid for the blind.
In the late 1880's, Edison's "Autographic Printing" was commercialized as the Mimeograph, giving a cheaper method of small batch document production.
Melvin Dewey (of Dewey Decimal fame) invents the hanging file folder (1893), leading to increased efficiency of document storage and retrieval.
The Harris Automatic Press Company is incorporated in 1895, ushering in the commercial use of offset printing and a 10-fold increase in document output rates.
The invention of the Soundex algorithm by Robert Russell of Pittsburgh in 1918 allowed more efficient searching of files and cards indexed by surnames, by grouping together names which were phonetically similar.
In 1924 radio facsimile allows pictures, as well as text, to be transmitted long distances.
In 1948 Xerography gave us document duplication without the use of wet, messy chemicals.
In 1969, IBM's Charles Goldfarb, Ed Mosher and Ray Lorie invented GML, the Generalized Markup Language, the ancestor of SGML, HTML and XML.
The 1970's saw the rise of the first computer-based word processors, including Wang's Office Information System.
In 1974 Xerox PARC engineers create Bravo, the first WYSIWYG word processor.
In 1975, with the rise of office automation systems and early word processors, Business Week boldly proclaimed the "Paperless Office".
At this point we reach an important fork in the road of history. What role would the computer and office automation mean for the future of documents? Does the paperless office become a reality? Or do we remain with paper-based documents? As Xerox PARC engineers were developing the world's first WYSIWYG word processor, at the same time they were also developing a system for transporting documents electronically, from one computer to another. But this innovation was dropped because it went against Xerox's core business, the creation and duplication of paper documents. So the choice was made. Paper still ruled. Paper consumption went up, not down. The word processor made it easier to produce more paper, faster. The paperless office did not happen, at least not yet. More first-hand details on this fascinating topic can be read in Sellen & Harper's The Myth of the Paperless Office
And so we continued, for another 20 years, of WYSIWYG word processors, WordStar, MacWrite, Writing Assistant, Manuscript, WordPerfect, Word, WordPro, etc. We all created documents and hid the files away on our hard-drives in incompatible formats. When we needed to work with others we usually just printed out the document and exchanged the printout, using the 500-year old format of ink on paper.
Let's pause here and make some observations.
First, note the areas of sustained and recurring innovation. These have been consistent throughout the past 500 years and reflect the ongoing nature and practical concerns of business communications:
- Document authoring
- Document duplication
- Document distribution
- Filling out of forms
- Submission of forms
- Processing of forms
- Storage and Retrieval of documents
- Authentication of documents (not mentioned in the history above, but the use of Notary Publics and corporate seals has facilitated this with ink and paper documents, in some forms back to ancient Rome.)
But of course, we don't work this way anymore. Something changed, very recently. I don't print out agendas any more. I send them via email. I don't print out reports and review them with a red pen in hand. I mark them up electronically. In fact, unless I need to sign it or staple a receipt to it, I don't print out anything. I think I can live out the remainder of my professional career on only 2 reams of paper.
What happened then to change this? Why is there less of an emphasis on printed output today? What does this mean for WYSIWYG? And what does this mean for document formats?
These questions and others when I finish up this series in Part IV.
20 April 2007 — Another editing pass, tightening up the language, but still too long. Added link to "The Myth of the Paperless Office".
Labels: Standards
Comments:
Links to this post:
<< Home
Wow, that's some great footwork in this section of the essay. Imagine if Proust had written his grand novel in Word. The track changes features would not have been consistent enough over subsequent versions to keep up and retain his notes and changes, which would have greatly deterred translators throughout the 20th century.
So I guess the question is, when you want to save something/anything, what will you reach for? I'll take mine with ODF because I know how it is composed, who controls it, and that it will always have a freeware implementation of itself. At least that a lightyear ahead of that other spec.
So I guess the question is, when you want to save something/anything, what will you reach for? I'll take mine with ODF because I know how it is composed, who controls it, and that it will always have a freeware implementation of itself. At least that a lightyear ahead of that other spec.
It's more of leading to a paperless society that makes our documentation process change. Great to know that you still have your family's documents available. But what if we no longer keep records on hard copies? Will our emails be kept forever? We are hopeful but we're not sure. Do you have something we can look forward to on your next posting?
Where there was only a single format it was generally by choice of the people using / working with that format.
It wasn't just a question of who asked ISO to standardize it's format first !!
Also at the moment you already have different formats like .doc and .pdf . how can this be explained in your single format theory ?
It wasn't just a question of who asked ISO to standardize it's format first !!
Also at the moment you already have different formats like .doc and .pdf . how can this be explained in your single format theory ?
Goop point, and this one of the reasons why PDF is not the complete answer. PDF is great for capturing the final fixed presentation form of a document, but you lose the revisions, the spreadsheet formulas, the items that show how the work or collaboration was done. From an historical perspective, those details may be the most important details.
I took a course on the history of physics from Gerald Holton years ago. One thing I learned there was the importance of getting access to the scientist's lab notebooks. The published papers are too clean, they make things look too predictable, too obvious. You get a much better sense of how the discovery was really made when you read the notebooks and interpret every number and every symbol.
This applies to literary works as well. When the typewritten manuscript draft of T.S. Eliot's "The Wasteland" was discovered in 1968 we finally saw the handwritten notes, corrections and suggestions from his friend Ezra Pound, and realized how important that collaboration was to the work.
The same thing applies to business collaboration. It would be impossible for us to accomplish our work in the OASIS ODF TC without having a single interoperable format to work with, one rich enough to handle the formatting of the ODF specification, as well as rich enough to handle the change tracking, revision tracking and associated features needed for document collaboration.
I took a course on the history of physics from Gerald Holton years ago. One thing I learned there was the importance of getting access to the scientist's lab notebooks. The published papers are too clean, they make things look too predictable, too obvious. You get a much better sense of how the discovery was really made when you read the notebooks and interpret every number and every symbol.
This applies to literary works as well. When the typewritten manuscript draft of T.S. Eliot's "The Wasteland" was discovered in 1968 we finally saw the handwritten notes, corrections and suggestions from his friend Ezra Pound, and realized how important that collaboration was to the work.
The same thing applies to business collaboration. It would be impossible for us to accomplish our work in the OASIS ODF TC without having a single interoperable format to work with, one rich enough to handle the formatting of the ODF specification, as well as rich enough to handle the change tracking, revision tracking and associated features needed for document collaboration.
Will our electronic documents be kept forever? My focus is on ensuring that the formats in which the documents are stored and exchanged are capable of being read for long periods of time, i.e., that they are not tied to any one application, operating system or vendor. That was always the beauty of paper. You probably don't have personal access to a quill pen, a mimeograph machine, a radio facsimile receiver or carbon paper, but you can easily read any document produced with these technologies, because the document format, the conventions for how we read the documents, has remained the same.
It is interesting that in 500 years, until now, no commercial interest has ever dared to carve this vast interoperable document landscape into a private proprietary fiefdom. Sure, we had intentionally closed formats even in the days of ink on paper. We had our secret codes and Enigma machines and such. But this was the realm of espionage not of business collaboration.
But of course, in addition to the format issue, there are collection, physical media preservation, funding, privacy and other concerns about long term digital document archiving. The National Archives of Australia, in particular has written a lot on this subject. An open document format, enables, but alone is not sufficient for long-term availability of digital records. A lot of other pieces need to come together first.
To The Wraith's comment about choice, we should remember that in the paper world, ISO (and ANSI and others) standards played a large part in standardizing paper sizes, necessary for efficient filing and retrieval of documents. As a variety-reducing standard, the standard paper sizes lead to economies of scale around envelopes, filing folders, filing cabinets, printer paper trays, shredders, etc.
No one ever claimed that the exact dimensions of A4 paper were magically superior to paper that was 2% larger or 2% smaller. No one ever complained that standardizing on that paper size would eliminate user choice, and cause innovation to suffer. But as a variety-reducing standard it was good to adopt it and optimize the market in paper-related technologies around a single family of paper sizes.
It is interesting that in 500 years, until now, no commercial interest has ever dared to carve this vast interoperable document landscape into a private proprietary fiefdom. Sure, we had intentionally closed formats even in the days of ink on paper. We had our secret codes and Enigma machines and such. But this was the realm of espionage not of business collaboration.
But of course, in addition to the format issue, there are collection, physical media preservation, funding, privacy and other concerns about long term digital document archiving. The National Archives of Australia, in particular has written a lot on this subject. An open document format, enables, but alone is not sufficient for long-term availability of digital records. A lot of other pieces need to come together first.
To The Wraith's comment about choice, we should remember that in the paper world, ISO (and ANSI and others) standards played a large part in standardizing paper sizes, necessary for efficient filing and retrieval of documents. As a variety-reducing standard, the standard paper sizes lead to economies of scale around envelopes, filing folders, filing cabinets, printer paper trays, shredders, etc.
No one ever claimed that the exact dimensions of A4 paper were magically superior to paper that was 2% larger or 2% smaller. No one ever complained that standardizing on that paper size would eliminate user choice, and cause innovation to suffer. But as a variety-reducing standard it was good to adopt it and optimize the market in paper-related technologies around a single family of paper sizes.
Corporations have a desire to reduce the variety of technologies in their organisations to control the TCO. This is why they tend to select corporate standards for any technology of significance.
When multiple corporations select different standards on data they need to share, interoperability becomes a problem. This is especially true with documents because they are meant to be exchanged in the first place.
This article illustrates magnificently the significance of time. A standard that has trouble to last as lettle as a decade before it gets superseded is a major problem. Microsoft Office formats are deprecated every few years. OOXML comes with an expectation that billions of existing documents are mass converted for compatibility. Will we see a repeat at every new version of Office because Microsoft changes some details in the proprietary aspects of Office? You can't manage historical records that way.
When I see Microsoft promoting choice, I expect this to fall on deaf ears. Diversity is the worst choice of all and everybody including Microsoft knows it. Otherwise we wouldn't use TCP/IP. We would all stick with SNA, DECnet, XNS, IPX, Appletalk and the likes. We had plenty of choice back then.
When multiple corporations select different standards on data they need to share, interoperability becomes a problem. This is especially true with documents because they are meant to be exchanged in the first place.
This article illustrates magnificently the significance of time. A standard that has trouble to last as lettle as a decade before it gets superseded is a major problem. Microsoft Office formats are deprecated every few years. OOXML comes with an expectation that billions of existing documents are mass converted for compatibility. Will we see a repeat at every new version of Office because Microsoft changes some details in the proprietary aspects of Office? You can't manage historical records that way.
When I see Microsoft promoting choice, I expect this to fall on deaf ears. Diversity is the worst choice of all and everybody including Microsoft knows it. Otherwise we wouldn't use TCP/IP. We would all stick with SNA, DECnet, XNS, IPX, Appletalk and the likes. We had plenty of choice back then.
"No one ever claimed that the exact dimensions of A4 paper were magically superior to paper that was 2% larger or 2% smaller."
Actually, they were :-)
The A measures were standardized to fit the metric system.
After the meter was standardized over continental Europe, all machinery and tools were created in easy to measure dimensions.
Like the old buildings of the classical ages, which all had dimensions in whole local feet/thumbs etc. sizes.
The A0 basic size was determined by the standard square meter. Paper weight and costs are by the square meter too.
Therefore, having basic sheet sizes in integral numbers per square meter simplifies acounting tremendously.
There are 16 A4 in a square meter. So there are 32 square meters in 512 sheets. Standard packages are 500 sheet (I didn't count them, this could be rounded). So the weights and costs are easy to calculate.
Making them 2% larger/smaller makes acounting really difficult.
Rob
Actually, they were :-)
The A measures were standardized to fit the metric system.
After the meter was standardized over continental Europe, all machinery and tools were created in easy to measure dimensions.
Like the old buildings of the classical ages, which all had dimensions in whole local feet/thumbs etc. sizes.
The A0 basic size was determined by the standard square meter. Paper weight and costs are by the square meter too.
Therefore, having basic sheet sizes in integral numbers per square meter simplifies acounting tremendously.
There are 16 A4 in a square meter. So there are 32 square meters in 512 sheets. Standard packages are 500 sheet (I didn't count them, this could be rounded). So the weights and costs are easy to calculate.
Making them 2% larger/smaller makes acounting really difficult.
Rob
"Where there was only a single format it was generally by choice of the people using / working with that format."
To go back to A measures. The metrical system was forced upon continental Europe by Napoleon. We were all the better for it. Time zones were forced upon us by the railroads.
Doc was forced upon us by Microsoft. Even as an MS Office user, I hated the format because it has made me lose data and work. Still I had to use the application so there was no choice.
I want to be able to buy lightbulbs without having to worry over fitting them into my lamp-shades. Just as I have always hated proprietary vacuum cleaner bags, which were expensive and I kept getting the wrong ones (I finally bought a Dyson).
Winter
To go back to A measures. The metrical system was forced upon continental Europe by Napoleon. We were all the better for it. Time zones were forced upon us by the railroads.
Doc was forced upon us by Microsoft. Even as an MS Office user, I hated the format because it has made me lose data and work. Still I had to use the application so there was no choice.
I want to be able to buy lightbulbs without having to worry over fitting them into my lamp-shades. Just as I have always hated proprietary vacuum cleaner bags, which were expensive and I kept getting the wrong ones (I finally bought a Dyson).
Winter
Just to be picky, but the meter was never standardised in Europe. The metre was.
This is yet another issue where one standard should be "choosen", because multiple versions of English help no one.
I'd like to see Microsoft explain why US English and UK English (along with all the other variants) are good for the consumer.
This is yet another issue where one standard should be "choosen", because multiple versions of English help no one.
I'd like to see Microsoft explain why US English and UK English (along with all the other variants) are good for the consumer.
Last anonymous--by your logic, multiple languages help no one.
Maybe the US/UK divide is silly, but consider this: lots of us speak differently for a reason. American bureaucrats do not talk like British bureaucrats because they have to. The Justice Department is not the same as the Home Office. And attorneys/lawyers are not the same as barristers/solicitors.
Even if we were to have a world tongue, there would still be all sorts of parlances and jargons. They all have different uses.
Just like file formats.
Maybe the US/UK divide is silly, but consider this: lots of us speak differently for a reason. American bureaucrats do not talk like British bureaucrats because they have to. The Justice Department is not the same as the Home Office. And attorneys/lawyers are not the same as barristers/solicitors.
Even if we were to have a world tongue, there would still be all sorts of parlances and jargons. They all have different uses.
Just like file formats.
British English versus American, Australian, Indian or whatever variety of English -- this is partly a product of different environments. The settlers in North American encountered different animals and plants and sometimes gave them new names, sometimes adopted the names from the Native Americans, and sometimes applied the existing name of the closest thing they were familiar with back home.
On top of that there was the factor of separate linguistic evolution enforced by geographic isolation.
The English language has picked up words from the many cultures it has been in contact with over the centuries: French, Norse, Spanish, scientific terms from Greek and Latin, Arabic, Indian, Celtic, etc. This has made English richer, but has not turned it into something un-English.
The average high school graduate can read Shakespeare with little difficulty, Chaucer with some help, and Beowulf with a semester of Anglo Saxon. This is pretty good stability, although I know there are examples of isolated languages which have been even more stable over time, such as Icelandic.
On top of that there was the factor of separate linguistic evolution enforced by geographic isolation.
The English language has picked up words from the many cultures it has been in contact with over the centuries: French, Norse, Spanish, scientific terms from Greek and Latin, Arabic, Indian, Celtic, etc. This has made English richer, but has not turned it into something un-English.
The average high school graduate can read Shakespeare with little difficulty, Chaucer with some help, and Beowulf with a semester of Anglo Saxon. This is pretty good stability, although I know there are examples of isolated languages which have been even more stable over time, such as Icelandic.
The basic problem with all the language responses is that language is not a standard. National governements, which itself are an invention of the last centuries, have tried to standardize and enforce languages, but that never worked.
The reason is that speech is not a designed artefact, but part of our biology. Language and speech are as much part of being human as walking on two legs.
What gets standardized is a national ORTHOGRAPHY. Writing IS a designed artefact and has to be standardized to allow communication.
This difference is clearly visible from the fact that in English, the relation between orthography and speech is little better than in Chinese. But in many languages, there is a good correspondence between these two, eg, Spanish and Italian.
The differences between the variants of English (and Chinese) are covered up by the orthography, where most incomprehensible variants are found in the UK.
So, US, UK, and Ausie writings present no problems, but try to understand rural Scottisch SPOKEN English (one of the oldest variants).
Winter
The reason is that speech is not a designed artefact, but part of our biology. Language and speech are as much part of being human as walking on two legs.
What gets standardized is a national ORTHOGRAPHY. Writing IS a designed artefact and has to be standardized to allow communication.
This difference is clearly visible from the fact that in English, the relation between orthography and speech is little better than in Chinese. But in many languages, there is a good correspondence between these two, eg, Spanish and Italian.
The differences between the variants of English (and Chinese) are covered up by the orthography, where most incomprehensible variants are found in the UK.
So, US, UK, and Ausie writings present no problems, but try to understand rural Scottisch SPOKEN English (one of the oldest variants).
Winter
I still would like to know if you all think that when OOXML would have been at ISO first (like when ODF would have waited for spreadsheet formula's) that you all would have agreed upon OOXML to have been the only standard as you can only have a single standard...
Actually, there are competing paper standards - metric A0/A1/A3/A3/A4...; B1/B2/B3/B4...; C0/C1/C2/C3/C4...; and those used in North America Letter/Legal etc. It is not fun sharing documents with North America when everywhere else uses metric, as A4 doesn't fit nicely on Letter. North America, as a market, is big enough to sustain a different usage to the rest of the world, and operating across the divide is painful. For an interesting read see http://www.cl.cam.ac.uk/~mgk25/iso-paper.html and especially the section labelled "Hints for North American paper users". The link in that website to ftp://ftp.isi.edu/in-notes/rfc2346.txt "Making Postscript and PDF International" is also useful.
I guess the question is why didn't Microsoft submit OOXML to ISO first? Why did they try to sell Massachusetts on their proprietary Office 2003 Reference Schemas? Why did they not even start their Ecma process until ODF was already being reviewed by ISO? Why did they not start documenting their spreadsheet formulas until after ODF did? Why did they decide to submit OOXML to ISO less than 1 month after ODF was published by ISO?
Honestly, I don't think Microsoft is really all that enamored of OOXML. But they do know that division and chaos will trap users on the perceived stability their monopoly desktop, whereas a single interoperable office file format would give users true choice, and true choice means some of their customers would leave for alternative office suites. That is why they would never have gone to ISO first, because they would be scared that people might actually adopt it.
Remember Microsoft once had the file format documentation for their binary formats publicly available, but they withdrew it when Bill Gates (1999) said it was "crazy" to share such information with their competitors.
Honestly, I don't think Microsoft is really all that enamored of OOXML. But they do know that division and chaos will trap users on the perceived stability their monopoly desktop, whereas a single interoperable office file format would give users true choice, and true choice means some of their customers would leave for alternative office suites. That is why they would never have gone to ISO first, because they would be scared that people might actually adopt it.
Remember Microsoft once had the file format documentation for their binary formats publicly available, but they withdrew it when Bill Gates (1999) said it was "crazy" to share such information with their competitors.
@Rob
[quote]I guess the question is why didn't Microsoft submit OOXML to ISO first? [/quote]
I guess they weren't asked by the EU to submit their format for ISO standardisation. There was no real need for submitting your standard to ISO a few years ago because noone was requiring an open or ISO standard at the time. And Microsoft was already wel on the way of opening up their format using an XML format which already made sure that any data in the documents would be retrievable forever.
[quote] Why did they decide to submit OOXML to ISO less than 1 month after ODF was published by ISO? [/quote]The descision to put OOXML to ISO via Ecma was made already in the end 2005 as you are well aware of. However Ecma only ratifies standards twice a year so the timing for the submission of OOXML by Ecma was related to that and not related to the ISO publication of ODF.
I bet MS now wishes it would have been earlier in the submission of the standard but at the time that they were devloping the format they just did not feel any urgency (and frankly there should be no need to hurry as ISO standardisation shouldn't be a race on who is first !!).
I guess it is possible that MS would not have submitted it's format to ISO if ODf had'nt and if governments and the EU would not have started a discussion about using an ISO format.
But on the other hand we all can easily see that ODF standardisation wasn't originally ment for ISO standardisation either. ISO standardisation isn't mentioned any OASIS document from the Open Office XML format TC that created ODF early in the development of the format. It seems that ODF moving towards ISO was only triggered by the EU asking OASIS for such a move making the ISO standardization by OASIS more of a political move.
Btw, noone did answer my question ... ?
[quote]I guess the question is why didn't Microsoft submit OOXML to ISO first? [/quote]
I guess they weren't asked by the EU to submit their format for ISO standardisation. There was no real need for submitting your standard to ISO a few years ago because noone was requiring an open or ISO standard at the time. And Microsoft was already wel on the way of opening up their format using an XML format which already made sure that any data in the documents would be retrievable forever.
[quote] Why did they decide to submit OOXML to ISO less than 1 month after ODF was published by ISO? [/quote]The descision to put OOXML to ISO via Ecma was made already in the end 2005 as you are well aware of. However Ecma only ratifies standards twice a year so the timing for the submission of OOXML by Ecma was related to that and not related to the ISO publication of ODF.
I bet MS now wishes it would have been earlier in the submission of the standard but at the time that they were devloping the format they just did not feel any urgency (and frankly there should be no need to hurry as ISO standardisation shouldn't be a race on who is first !!).
I guess it is possible that MS would not have submitted it's format to ISO if ODf had'nt and if governments and the EU would not have started a discussion about using an ISO format.
But on the other hand we all can easily see that ODF standardisation wasn't originally ment for ISO standardisation either. ISO standardisation isn't mentioned any OASIS document from the Open Office XML format TC that created ODF early in the development of the format. It seems that ODF moving towards ISO was only triggered by the EU asking OASIS for such a move making the ISO standardization by OASIS more of a political move.
Btw, noone did answer my question ... ?
"I guess they weren't asked by the EU to submit their format for ISO standardisation."
I do remember this completely different.
MS were begged by the EU to coordinate their Office format with what became ODF.
They were invited to the Oasis meetings. They consistently refused to participate (but they did observe what happened). They also refused to inform the OASIS commitee about the MS office formats.
No one tried to exclude MS, but they isolated themselves.
Rob
I do remember this completely different.
MS were begged by the EU to coordinate their Office format with what became ODF.
They were invited to the Oasis meetings. They consistently refused to participate (but they did observe what happened). They also refused to inform the OASIS commitee about the MS office formats.
No one tried to exclude MS, but they isolated themselves.
Rob
In response to the wraith:
Those formats don't fit neatly into the theory Rob expounds here. But his theory seems a bit forced.
Do playbills, newspapers, office forms, and novels really and truly share the same format?
The fact that they are printed with ink on paper does NOT mean they are the same format.
By that logic, ODF and OOXML (or XLS and 123) are the same format: both are stored on disks with magnetic charges.
Just because we don't notice the format differences between a dictionary and an anthology of sonnets does not mean that there are none. Both of these have very well-specified formats. Our brains convert both formats transparently into thought.
The difference with computer formats is that computers are not intelligent. They don't learn how to read other formats--we have to program them to do so.
Those formats don't fit neatly into the theory Rob expounds here. But his theory seems a bit forced.
Do playbills, newspapers, office forms, and novels really and truly share the same format?
The fact that they are printed with ink on paper does NOT mean they are the same format.
By that logic, ODF and OOXML (or XLS and 123) are the same format: both are stored on disks with magnetic charges.
Just because we don't notice the format differences between a dictionary and an anthology of sonnets does not mean that there are none. Both of these have very well-specified formats. Our brains convert both formats transparently into thought.
The difference with computer formats is that computers are not intelligent. They don't learn how to read other formats--we have to program them to do so.
If a playbill, a novel, a sonnet, etc., are all saved as MS Word DOC files, do they not share the same format? Of course they do. It is called...wait for it... MS Word DOC format.
Certainly there are various levels of semantic, structural, and linguistic "formats" that are above the level of paper and ink. A Shakespearean and Petrarchan Sonnet can be called "different formats" if you are willing to stretch the definition of "format" far enough. But I see no reason to torture the language when a more reasonable, common sense definition is at hand.
The fact is that you and I and everyone else can read 500+ years of paper and ink documents with no assistance, but if I am a Mac user today, I cannot read an OOXML spreadsheet document. That is the difference between having a single, universally accepted document format, and having small islands of interoperability separated by oceans of miscommunication, lessened fidelity and data loss.
We want a single electronic document format to preserve the interoperability that we all have always had, for 500+ years, a continuity of document interoperability that fired the Reformation, the Enlightenment, the Industrial Revolution, and the Space Age, and which has just been threatened by proprietary interests in the last 10 years or so. Is that too much to ask?
Certainly there are various levels of semantic, structural, and linguistic "formats" that are above the level of paper and ink. A Shakespearean and Petrarchan Sonnet can be called "different formats" if you are willing to stretch the definition of "format" far enough. But I see no reason to torture the language when a more reasonable, common sense definition is at hand.
The fact is that you and I and everyone else can read 500+ years of paper and ink documents with no assistance, but if I am a Mac user today, I cannot read an OOXML spreadsheet document. That is the difference between having a single, universally accepted document format, and having small islands of interoperability separated by oceans of miscommunication, lessened fidelity and data loss.
We want a single electronic document format to preserve the interoperability that we all have always had, for 500+ years, a continuity of document interoperability that fired the Reformation, the Enlightenment, the Industrial Revolution, and the Space Age, and which has just been threatened by proprietary interests in the last 10 years or so. Is that too much to ask?
"I do remember this completely different."
Then I refer you to:
http://www.oasis-open.org/archives/office/200409/msg00000.html
Where it is clear that the IDA program from the European commision recommended the ISO submission. This at a point two years into the standardisation where untill that date the ISO submission was not on the TC roadmap at all.
Then I refer you to:
http://www.oasis-open.org/archives/office/200409/msg00000.html
Where it is clear that the IDA program from the European commision recommended the ISO submission. This at a point two years into the standardisation where untill that date the ISO submission was not on the TC roadmap at all.
Wraith,
To your original question, what if OOXML had been the first at ISO, etc.
If you mean OOXML as it was approved by Ecma, the OOXML as we know it today, tied to Office and tied to Windows, and filled with special hacks which will be of use to only Microsoft, then no, I would not have favored it and I believe neither would ISO.
However, if OOXML had been the first format proposed, in a SDO like OASIS or Ecma, and if the TC charter had been written in such a way that the TC actually had the ability to make a quality, platform and application-neutral document format, then yes, I personally would have supported that proposal, and probably would have participated as well.
Remember, we're no strangers to working with our competitors in areas of common interest. We work closely with Microsoft on other standards, and have for years. Although IBM was not one of the original authors of the ODF standard, we're working closely with Sun, and others, on this standard, even though we compete vigorously against Sun on several other fronts.
If Microsoft had shown some leadership, and a genuine interest in creating an open format, then that would have been well received. But in this case, I'm not seeing standards leadership from Microsoft. Instead I'm seeing a desperate attempt to lock Office users into a format that will prevent them from meaningful data exchange with other office suites and applications, an attempt to lock these users into the Windows/Office platform for another 10 years. Luckily this is obvious, and the thin veneer of openness which they attempted to cover their efforts has fooled almost no one.
To your original question, what if OOXML had been the first at ISO, etc.
If you mean OOXML as it was approved by Ecma, the OOXML as we know it today, tied to Office and tied to Windows, and filled with special hacks which will be of use to only Microsoft, then no, I would not have favored it and I believe neither would ISO.
However, if OOXML had been the first format proposed, in a SDO like OASIS or Ecma, and if the TC charter had been written in such a way that the TC actually had the ability to make a quality, platform and application-neutral document format, then yes, I personally would have supported that proposal, and probably would have participated as well.
Remember, we're no strangers to working with our competitors in areas of common interest. We work closely with Microsoft on other standards, and have for years. Although IBM was not one of the original authors of the ODF standard, we're working closely with Sun, and others, on this standard, even though we compete vigorously against Sun on several other fronts.
If Microsoft had shown some leadership, and a genuine interest in creating an open format, then that would have been well received. But in this case, I'm not seeing standards leadership from Microsoft. Instead I'm seeing a desperate attempt to lock Office users into a format that will prevent them from meaningful data exchange with other office suites and applications, an attempt to lock these users into the Windows/Office platform for another 10 years. Luckily this is obvious, and the thin veneer of openness which they attempted to cover their efforts has fooled almost no one.
"Where it is clear that the IDA program from the European commision recommended the ISO submission. This at a point two years into the standardisation where untill that date the ISO submission was not on the TC roadmap at all."
But what is your point.
The EU contacts an ongoing, international, multivendor, open standardization effort and asks them to submit it to ISO.
Can we really expect the EU to contact a private USA firm that has steadfastilly refused to communicate about their Office formats to send their secret formats for ISO standardization?
At that time it wasn't even clear whether the secret MS XML formats could be legally READ by anyone but MS.
Winter
But what is your point.
The EU contacts an ongoing, international, multivendor, open standardization effort and asks them to submit it to ISO.
Can we really expect the EU to contact a private USA firm that has steadfastilly refused to communicate about their Office formats to send their secret formats for ISO standardization?
At that time it wasn't even clear whether the secret MS XML formats could be legally READ by anyone but MS.
Winter
"I guess they weren't asked by the EU to submit their format for ISO standardisation."
Your guess is not entirely correct, because EU (i. e. its Telematics between Administrations Committee) dis ask MS to submit their format "an international standards body of their choice".
That was in May 2004, in the same document in which OASIS was invited to submit their format to "an official standardisation organisation such as ISO". See: http://ec.europa.eu/idabc/en/document/2592/5588
However, the response of Microsoft at the time was: "we believe that open and royalty-free licensing programs have a role to play alongside formal standard efforts in helping achieve our mutual goals relating to interoperability."
You can find the whole text of Microsoft's response at the address:
http://ec.europa.eu/idabc/servlets/Doc?id=18036
Post a Comment
Your guess is not entirely correct, because EU (i. e. its Telematics between Administrations Committee) dis ask MS to submit their format "an international standards body of their choice".
That was in May 2004, in the same document in which OASIS was invited to submit their format to "an official standardisation organisation such as ISO". See: http://ec.europa.eu/idabc/en/document/2592/5588
However, the response of Microsoft at the time was: "we believe that open and royalty-free licensing programs have a role to play alongside formal standard efforts in helping achieve our mutual goals relating to interoperability."
You can find the whole text of Microsoft's response at the address:
http://ec.europa.eu/idabc/servlets/Doc?id=18036
Links to this post:
<< Home

