• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for Rob

Rob

The Funnel and the Wedge

2007/05/18 By Rob 12 Comments

The idea was prompted by a comment a reader submitted to a recent post, where he talked about one of the challenges of trying to bridge two different formats (in this case ODF and OOXML) via translation:

Both formats must evolve their new versions simultaneously in locksteps…

This one is the killer. Trying to have two formats permanently synchronized this way is a maintenance nightmare, especially when we discuss standards with multiple implementations maintained by different organizations.

This is an important point, and bears some reflection. In my mind I have images of the funnel and the wedge, physical means of convergence and divergence. Similar forces are at play with standards.

We see the Funnel in the evolution of HTML. Although the standard has existed for over a decade, implementation support for HTML and related standards was uneven until quite recently. Interoperability was poor. From the start vendors added incompatible extensions while not implementing key features. Developers had to write extensive workarounds and alternative representations to work on all browsers. And when they did not that, their web sites might not work with all browsers. But with customer demand and prodding from groups like the Web Standards Group, the interoperable support of the HTML standard across implementations happened. There was convergence. What we have today, although not perfect, is clearly the result of a Funnel, concentrating industry effort around a single standard (or more like a family of standards).

This does not mean that vendors needed to sacrifice innovation, or deny their customers. It just meant that they accomplished their business objectives while also complying with standards. Along with adhering to various financial and securities regulations, labor law, health and safety, and other requirements, both internal and external, voluntary and mandatory, the browser vendors now complied with web standards. It is just another part of doing business.

Similar Funnels have occurred historically with network protocols, wireless telephony (in Europe at least), electrical grids, broadcast formats, etc. I’ve written elsewhere about what types of technologies tend to converge like this and why.

We see the Wedge when two standards compete in the same space and diverge into incompatible technologies. Microsoft is the master of the Wedge, with numerous examples over the years, usually proprietary, but more recently attempting to gain de jure recognition of them. But the mechanism is the same in either case: VML, JScript, MS Kerberos, J++, C++/CLI, XPS, and of course OOXML. Standardization just means that Microsoft has another tool for telling you that the Wedge is good for you. But it is a Wedge nevertheless.

The Wedge brings fragmentation, confusion and lack of interoperability, attacking the core reasons for having a standard in the first place. Once the primary value of an open standard is eliminated, we can all return to the security and comfort of our monopolist overlords. That is their main goal. Make no doubt about it, true interoperability and true choice are very scary propositions for Microsoft. It cuts at their very business model.

So consider the Funnel and the Wedge as applied to document formats. If we all use ODF today, is interoperability perfect? No. Do we know how to move forward to improve interoperability, and work together in multi-vendor consortia to perfect this. Yes, certainly. That is why and how such standards as TCP/IP, HTTP or HTML, work today. Interoperability came via the Funnel, a convergence of effort and attention leading to increased interoperability and the user and industry benefits that flow from that interoperability.

But from the Wedge, what can we expect? If Microsoft is successful, here’s what I see, my dismal predictions:

  1. Within 30 days after OOXML is approved by ISO we see the demise of Microsoft’s half-hearted attempt to create ODF Add-ins for Office. We’ll never see a functional Add-in from them for Excel or PowerPoint, and the Word one will remain unacceptably slow.
  2. Microsoft will continue to evolve OOXML behind closed doors. 99% of the work will be based on product and decisions in conference rooms in Redmond, which will be later rubber stamped by Ecma and ISO.
  3. OOXML and ODF will continue to evolve and diverge, in incompatible ways.
  4. Seeing their success ramming through 6,000 page Fast Track submissions in ISO, Microsoft will follow up with similar fast track submissions for XPS, XAML, Silverlight, Windows Media Photo, whatever they have. Since they have taken the trouble to set up the machinery to dominate JTC1, they will continue to force feed them with additional material.
  5. Every jurisdiction where ODF is currently allowed and mandated will also allow or mandate the use of OOXML. This in practice will be turned around to mandate the use of Windows and Office.
  6. Finally, once all opposition is rendered harmless, they can shut down OpenOffice.org and KOffice by patent lawsuits, but keep Novell’s version around in order to keep anti-trust regulators away. After all, 97% market share is not the same as 100%.

Maybe I’m a bit pessimistic, but I see little reason for optimism.

So do we have an alternative to the Wedge? What would encourage the Funnel? The following would need to happen:

  1. ISO must reject OOXML.
  2. Customers, from private and public sectors, must make their voices heard, that they want true interoperability and choice and that this means a single document format.
  3. Microsoft must support the existing ODF completely and fully in Office. It won’t happen overnight. But it won’t happen at all unless they start.
  4. OASIS must work with Microsoft (and Microsoft with OASIS, of course) so that that it is clearly explained how MS Office can fully represent their documents in ODF. This need not be a monolithic monster like OOXML, but should be a layered standard, with a basic core feature set and defined extensions and profiles that encompass wider and wider ranges of functionality. If Microsoft absolutely needs the “heebieJeebies” Art Border in Word in order to maintain 100% fidelity with legacy documents, then the ODF TC can show Microsoft how to encode this in ODF. The Funnel starts when Microsoft abandons their divergent effort in Ecma and joins the common effort around ODF, a single document format for personal productivity applications.
  5. The application vendors, Microsoft included, must work together on defining the organizational, standards and technical means necessary to measure, test and certify ODF compliance, so customers and procurement agencies are able to have assurances that they are getting the level of interoperability that they desire.

I think this is a natural progression. Accomplishing the first step stops the Wedge from progressing further, halting but not reversing the divergence. The other steps reverse the damage and turn us down a path of true interoperability, leading to true choice and innovation.

Finally note that the Wedge is typically driven by a single company. It is not a pull by public demand or from customers, though it may wear many disguises. It is a deliberate attempt by one party to cause division and divergence. But a Funnel, this won’t happen at all unless there is strong demand, from customers, from government agencies, from national standards committees, etc. If this is to happen, your voice must be heard. All of us must work to bring all of us together in this effort. But it takes just one company, with a sufficiently large Wedge to pull us apart.

Filed Under: ODF, OOXML

So where are all the OOXML documents?

2007/05/10 By Rob 29 Comments

Google has a nice feature that allows you to search for documents that match a given file type. This is done by adding “filetype:NNN” to your query, where NNN corresponds to the file type. This feature has supported the ODF and OOXML document formats for at least two months, when I first noticed it. I’ve been tracking some numbers since then and now have enough data to make some observations.

At last count the totals were:

Format Count
ODT 85,200
ODS 20,700
ODP 43,400
Total ODF 149,300
DOCX 471
XLSX 63
PPTX 69
Total OOXML 603

As you can see, there is some round-off happening on the upper range. Perhaps at the high-end counts are estimates based on sampling?

In any case, I am rather surprised by the low counts given for OOXML documents, especially considering that this format has been supported since the Office 2007 beta last summer. According to Brian Jones, there have been over 4 million downloads of the OOXML Compatibility Pack for older versions of Office, and that there is a new community of, “over 300 other companies and partners who care deeply about OpenXML”. We’re also told that Office 2007 sales are above expectations, “two times greater than the purchases of Office 2003” according to one research firm. Recently announced third-Quarter results for Microsoft showed “better than expected” results for Office 2007 sales, $200 million better, according to Microsoft CFO Chris Liddell.

So with all this evident love for Microsoft Office 2007, why is it that 6-months later there are only 63 OOXML spreadsheet documents on the web, something like 0.3% of the number of ODF spreadsheet documents? How can there be 300 companies supporting OOXML and only have 69 OOXML presentations on the web? (This is starting to sound like when I say I support 30 minutes of aerobic exercise a day. I don’t do it, but I sure support it!)

OK, I know the argument about “dark matter”, that Google indexes only the tip of the iceberg, that there is a lot of data squirreled away on PC hard-drives, behind corporate fire walls, etc., stuff that Google will never see. But the same is equally true for ODF documents, right? I have tons of ODF documents on my laptop, but none of them are indexed by Google.

Of course ODF has been around for a year longer than OOXML. That’s an important fact to acknowledge. We can put that in perspective by plotting the graph of ODF and OOXML document counts against the number of days since adoption of these two standards. So ODF counts are based on a start of 1 May 2005 and OOXML starting in 7 December 2006, when OASIS and Ecma respectively approved them. You get this:

As you can see, ODF has a nice upward trend. OOXML is also trending upwards, though it is somewhat lost at this scale. If you do the analysis it comes out to around 300 new ODF documents per day versus 6 for OOXML. So, two years later, ODF adoption, in terms of documents per day, is 50-times greater than OOXML is, at a time which should be OOXML’s high-growth period, considering all the great news that is coming out of Redmond.

So I’m a somewhat at a loss to appreciate the significance of Novell or Corel adding OOXML support to their editors. With only 63 OOXML spreadsheets out there, wouldn’t it be cheaper just to hire someone to retype the documents in the destination application? The average user is more likely to find a Buffalo Nickel in their lunch change than to find an OOXML document outside of captivity.

Filed Under: ODF, OOXML

Math markup marked down

2007/04/25 By Rob 16 Comments

Sun’s Erwin Tenhumberg fights some FUD about ODF and in passing provides a link that is worth a few more words. It appears that Science, the journal of the America Association for the Advancement of Science (AAAS), itself the largest scientific society in the world, has updated its authoring guidelines to include advice for Office 2007 users. The news is not good.

Because of changes Microsoft has made in its recent Word release that are incompatible with our internal workflow, which was built around previous versions of the software, Science cannot at present accept any files in the new .docx format produced through Microsoft Word 2007, either for initial submission or for revision. Users of this release of Word should convert these files to a format compatible with Word 2003 or Word for Macintosh 2004 (or, for initial submission, to a PDF file) before submitting to Science.

Well, so much for 100% compatibility, eh? That is what I’ve been talking about. Whether you move to OOXML or ODF you will be making a change that will break compatibility with your past document processing systems. You will need to change over the next couple of years and you will need to examine your choices carefully. But don’t get suckered into thinking that the choice of OOXML is magically painless. The 100% compatibility claims don’t hold water.

More bad news:

Users of Word 2007 should also be aware that equations created with the default equation editor included in Microsoft Word 2007 will be unacceptable in revision, even if the file is converted to a format compatible with earlier versions of Word; this is because conversion will render equations as graphics and prevent electronic printing of equations, and because the default equation editor packaged with Word 2007 — for reasons that, quite frankly, utterly baffle us — was not designed to be compatible with MathML. Regrettably, we will be forced to return any revised manuscript created with the Word 2007 default equation editor to authors for re-editing. To get around this, please use the MathType equation editor or the equation editor included in previous versions of Microsoft Word.

Uh oh. Not only cannot you not submit files in OOXML format, but you can’t even use Office 2007 and save in the old binary formats. Down conversion or using the Compatibility Pack won’t help. Microsoft’s decision to push a new “Open Math Markup Language” rather then use the well-established MathML standard appears to be a serious flaw.

Nature appears to have the same problem:


We currently cannot accept files saved in Microsoft Office 2007 formats. Equations and special characters (for example, Greek letters) cannot be edited and are incompatible with Nature’s own editing and typesetting programs.

Of course, when targeting final publication of a paper, a PDF file is fine. But when engaging in collaboration with another researcher, or an editor, you need to agree of a standard format in which you both can work.

Reuse of existing standards is important. When you reuse a standard, you are reusing more than a piece of paper. You are reusing the experience and effort that went into creating and reviewing that standard. You are reusing the experience gathered by those who have already implemented the standard. You are reusing the books and training materials already written for that standard. You are reusing the interfaces for other technologies that have already integrated with that standard or can produce or consume output that conforms to that standard.

Isaac Newton wrote, “If I have seen further it is by standing on the shoulders of giants”. When you reuse standards you reuse the accumulated wisdom of an industry and assume the vision and powers of giants. But when you ignore all precedents and go forth on our own, well, let’s just say the outcome is more variable in that case. You may be the next Einstein, or you may be the next fool.

If Science and Nature need to update their templates, then I’d suggest they take a look at ODF. Not only does it use MathML for equations, but it is an open standard, an ISO standard, a platform and application-neutral standard that has many implementation, including several good open source ones. If they need to update their processing, then they might want to make the smart choice now, the choice that increases their choices and flexibility going forward.


18 June 2007 Update

A response from Nature and one of their vendors, explaining the complexity of migrating their publishing ecosystem to a new file format. Quoting a letter to Microsoft from Bruce Rosenblum of Inera:

Had the conversion from DOCX to DOC provided a conversion from OMML to Equation Editor format, it would have provided the necessary backwards compatibility for publishers to upgrade one system at a time. But because this compatibility is not available, it’s created the need for a “big bang” upgrade, or a delay until the ecosystem of inter-dependent systems is deliberately updated over time. In the environment of scholarly publishing, such substantive upgrades often take years, not months.

Filed Under: ODF, OOXML, Standards

Sometimes I need to remind myself

2007/04/23 By Rob 22 Comments

Tim Anderson has an interesting article up on his ITWriting blog, “Microsoft’s Jean Paoli on the XML document debate”. Of course, I treat anything Jean Paoli says on XML with such attention as I usually reserve for listening to the isorhythmic motets of Philippe de Vitry. Like de Vitry, Paoli can be understood on several different levels: What is he saying? And what is he really saying. As a student of Empson’s “Seven Types of Ambiguity”, I hope that I am up to the task.

There is, of course, the familiar canard, that IBM is the source of all of their problems:

It is clear though that Paoli is upset by what he sees as an international campaign against OOXML orchestrated by IBM, the sole naysayer in the ECMA voting. “There are IBM employees going to ISO, and saying a lot of technically incorrect things. When ODF went to ISO Microsoft did not interfere. IBM is betting on ODF, to have governments preferentially buying IBM software. It is OK to compete, but using this kind of argument around is it an open format or not … it’s widely known now, Office Open XML is an open format, even the EU says it is.”

A Google search on the words ecma ibm sole vote returns an embarrassingly large number of hits. Microsoft has certainly been having fun with this line. Let’s take a little look at this question and see if we can better define this conspiracy that Paoli is alluding to.

I’m now going to rant a little. You may want to stand back.

Yes, IBM was the only voting member in Ecma who cast a voted against OOXML. But guess what, we’re probably the only company who actually had someone perform the due diligence of reading the specification. The others voted on OOXML without reading the spec. So please give their “Yes” votes all the weight they deserve, but not more.

It seems to me that Ecma has become a standards factory, a place where you go for clean, efficient, no-guilt, fast-track service. Don’t want to publish your public comments? Fuggetaboutit. Don’t want to publish your meeting minutes? Fuggetaboutit. Worried about rushing through a 6,000 page specification in less than a year, with 20x less scrutiny than average? Fuggetaboutit. Want to have a unanimous vote, along with with a souvenir photograph of your face when the vote occurs? Yes sir, we guarantee it.

However, for the privilege of this elite service, you must cough up the dough. You will not find Ecma’s rate card on their website, but I’m told that voting membership will set you back $57,000. This is not exactly the club to join if you are a small (or medium) business, non-profit, public sector agency, or anything but one of the big boys. A list of the privileged twenty voting members of Ecma can be found here.

As you can imagine, one does not become a voting member of Ecma without a good reason. This is a business expense, not a charitable contribution. For $57K, one expects $57K of service. To justify that membership fee, you expect your technology to be blessed with an Ecma standards imprimatur without hassles. So the “unwritten rule” is that everyone votes in favor of everyone else’s proposal. It is considered rude to vote against something that another elite member has paid so much for. So, IBM gets get a lot of grief for casting a single “No” vote at a single Ecma General Assembly. We broke the club rules. I’m proud to work for such a company.

My question is this: How many “No” votes have been cast in Ecma in the past 5 years? When before did another Ecma member ever vote “No” on a standard? If no one can remember even a single previous “No” vote, or (sacre bleu!) a defeated standard, then that speaks volumes. In a healthy standards body, a single “No” vote should not be a newsworthy event, and should certainly not be something that Microsoft is still complaining about 6 months later.

To put this in perspective, the base category of OASIS voting memberships (Contributor) starts at $1,100. OASIS has something like 330 organizational members eligible to vote, including all categories of companies, government agencies, non-profits, etc.

I should also note, just coming from the annual OASIS Symposium held last week, that the OASIS Board of Directors is looking at changing the OASIS voting rules to make it more difficult for OASIS standards to be approved. Yup, we’re raising the bar.

When I see this I need to try extra hard to remind myself that IBM is just interfering with Microsoft’s good-faith attempt to humbly submit for our consideration their well-written, detailed, high-quality, interoperable open standard.

ISO/IEC JTC1/SC34 recently had its annual plenary. This is the same group of ISO National Body (NB) members who voted in favor of ODF last year, and over the next few months many of them will be recommending positions on Microsoft’s OOXML to their national standards bodies. I was on the delegates list for attending this meeting, as a representative of the US NB, but had to cancel at the last minute because of a family emergency. When I saw the attendance list, I was surprised to see that Microsoft had sent five people, this to a meeting of only 37 people. They practically darkened the skies with their employees. And what about the conspiratorial army that is hounding them at every corner? Zero people from IBM. Zero as well for Google, Sun, RedHat, Adobe, Oracle and Novell.

When I read this I need to remind myself that I’m part of a vast global conspiracy to deny Microsoft a fair hearing within ISO. The fact that no one in this vast global conspiracy managed to show up at the meeting was simply a ploy to make Microsoft feel overconfident.

In the US NB, we have a committee called INCITS V1. It is the mirror committee to JTC1/SC34. I serve on it, the only member from IBM. Imagine my surprise, when at our last call, Microsoft shows up with 3 employees and a business partner as new members. Four people against little ol’ me? Come on guys, that is just sad.

At times like this I need to remind myself that Microsoft is the underdog and IBM and its allies are ganging up them. But our guys are invisible at meetings and although they cannot vote, they do have ninja powers and, in matters of external affairs, the delegated plenipotentiary prerogatives of Klingon Ambassadors. “choSuvchugh ‘oy’lIj Daghur neH”.

Microsoft bloggers, fed and spreading like mushrooms, recently popped up and simultaneously announced a new pro-OOXML petition, self-published, self-hosted and self-reported by Microsoft. You couldn’t find anyone to even pretend to support you? You had to host your own petition? This is like throwing a birthday party and having only your mother show up. Very sad. Where are your friends, Microsoft? How come we hear no one else speaking approvingly about OOXML? Where are the other companies lining up? Where are the endorsements? The testimonials? All we hear is that Microsoft thinks OOXML is great. But that is just Mom cheering on your performance. Don’t you have any real support?

Btw, this is what a real petition looks like. It is hosted by a reputable party (the Prime Minister) and gives a open, public listing and tally of those who signed the petition.

At times like this I need to remind myself that the ODF supports are the outsiders in this debate, using unconventional and covert tactics to fight a well-respected and well-loved mainstream technology generously provided by Microsoft.

I see that Microsoft likes to throw around names like the British Library and Library of Congress, as if the mere mention of their holy names brings sacramental blessings. But please show me a public statement where either of these bodies has endorsed, adopted, recommended adoption or recommended approval of OOXML. The mere mention in passing of well-known and popular institutions lends no credibility to your argument, and credible arguments are important, as is well-known to anyone familiar with Walt Disney World, the Louvre, NASA , the Boston Red Sox, or the Department of Really Important Stuff .

A Malaysian standards committee was moving forward to approve ODF as a national standard in Malaysia. This is called “transposing” an International Standard, and is commonly done when a relevant International Standard is approved. Microsoft has made every attempt possible to prevent this committee from making progress with their review of ODF, for almost a year now. This progress recently came to a halt, the committee’s decisions nullified and the committee suspended.

When standards committees are disbanded when they get too close to approving ODF, then I must pinch myself and remind myself once again that IBM is the one orchestrating international campaigns against Microsoft, and not the other way around.

I’ve heard similar complaints from other NB’s. Why bother reviewing OOXML? Why waste the effort reading it and suggesting improvements? Microsoft has ignored every suggestion given it so far by NB’s. And if you vote no, Microsoft will just escalate and try to get some mid-level government bureaucrat to set aside the recommendation of your country’s technical experts. What waste the next 4 months reviewing a 6,000 page specification? It happened in Malaysia. It happened in the US. The INCITS Executive Board was about to send a contradiction submission against OOXML, saying that it possibly contradicted ODF. But before the committee could reconvene the next morning, enough members had received urgent phone calls to cause them to change their vote and abstain. We saw this in the Netherlands as well, where it was even reported in the papers that they would vote against OOXML. But that vote was changed at the last minute with the cryptic message to the JTC1 Secretariat: “The Netherlands Standardization institute votes ‘abstain’. Please change our vote accordingly and please confirm receipt of this vote to me…” What happened there is still unclear. In India it was even worse, when the committee that was supposed to get the ballot did not receive it. Evidently it was misplaced. The intervention of the leader of a major national political party was required to straighten it out. I also received a note saying that the committee was being told that the deadline for responding to the ballot was two weeks later than it really was, a delay that would have invalidated their vote if they had fallen into that trap.

When I see stuff like this happening, I need to remind myself, really, really hard, that IBM is the bad guy in this debate and that we’re the one interfering with an orderly ISO process.

When an amendment to a Florida State Senate bill was offered that called for a “business case analysis” for the use of open standard document formats (no particular format was called out) Microsoft’s lobbyists, the three Men in Black, Will McKinley of Dutko Poole McKinley, Jim Daughton, Jr. and Geoffrey Becker both of Metz, Hauser, Husband & Daughton, swarmed down and zapped it. As one legislative aide put it, “By the time those lobbyists were done talking, it sounded like ODF (Open Document Format, the free and open format used by OpenOffice.org and other free software) was proprietary and the Microsoft format was the open and free one”. Perhaps a document, left by the lobbyists, filled with lies about ODF, had something to do with it? We should be fortunate that Microsoft sent only three lobbyists to handle this, rather than all nine lobbyists who are registered in Florida alone to support Microsoft’s legislative activities.

When expressing our technical opinion defines interference, and the outrages that Microsoft is getting away with become the norms of behavior, then we’re all doomed to a future of technical subservience. We all need to remind ourselves of that.

Microsoft likes to complain, and they are evidently becoming quite adept at it. If decibels and dollars could win arguments then they would surely be the winners. But I think their protestations are mis-directed. Microsoft is like an out-of-condition middle-aged man (somewhat like myself) out for a rare jog. They can curse to the high heavens the pain they feel, but don’t blame it on others. It is called competition. Deal with it. If it hurts so much it is because you are so out of practice. You should try having competition more often. It is good for you.

Filed Under: Microsoft, ODF, OOXML

The Case for a Single Document Format: Part III

2007/04/10 By Rob 14 Comments

This is Part III of a four-part post.

In Part I we surveyed of a number of different problem domains, some that resulted in a single standard, some that resulted in multiple standards.

In Part II, we described the forces that tend to unify or divide standards and showed in particular how network effects can drive the adoption of a single standard.

In this Part III we’ll look at the document formats in particular, how we got to the present point, and how and why historically there has been but a single universally-accepted document format.

In Part IV, we’ll tie it all together and show why there should be, and will be, only a single open digital document format.

The Meeting

It is 9:55 on an average Tuesday morning. I’m late (as usual) preparing for a meeting. With 5-minutes to go, I send out an updated meeting invite, with an updated agenda and a URL for the web conference. I also send out another email with an updated presentation attachment. It is the standard last-minute, pre-meeting shuffle that we all do. I expect that an examination of traffic statistics on IBM’s email servers shows a spike 5-minutes before every hour, as we all send out last-minute meeting updates. I login to my web conference and dial into the call. I’ll be meeting with my teammates, some in Westford, some in Raleigh, some in Portsmouth, some in Lexington, some in Dublin and some in Shanghai, a far-flung group. I’ve worked with some of these guys for years but still have never met most of them face-to-face. This is the nature of collaboration in a modern, global company. The call starts and I take a deep breath, push off my slippers and stretch my toes. Yes, I’m leading this meeting from home today.

“Don’t be impatient, Comrade Engineer; We’ve come very far, very fast”, in the words of Yevgraf Zhivago, Alec Guinness’s character in Doctor Zhivago. Let’s flash back 10 years ago and remind ourselves how we worked them…

It is 9:55 on an average Tuesday morning. I’m late (as usual) preparing for a meeting. With 5-minutes to go, I print out the agenda and handouts to the laser printer down the hall. It has printed by the time I arrive, and I sort through the three or four other print jobs to find the one that is mine. I need twelve copies for the meeting, so I join the queue at the photocopier, with everyone else who also waited to the last minute to print out the materials for their meetings. It is the standard last-minute, pre-meeting shuffle that we all do. I expect that an examination of statistics on IBM’s photocopiers shows a spike 5-minutes before every hour. I head over to the conference room and start the meeting. At the end of the call, 80% of the printed materials will be discarded, hopefully into the recycling bin. This was the nature of collaboration in a modern, global company, circa 1995.

What has changed? Why did it change? What does this mean for document formats?

My family in documents

Let me take you on a detour, back in time, to tell a 200-year family story, illustrated with official documents of the period.

I’ll start with the following excerpt from the 1930 Federal Census returns for Abington, Massachusetts, showing my grandmother, Florence Mae Cushing, then age 18, and her parents William and Mary, and household. The columns indicate the following:

  1. Name
  2. Relationship to the head of household
  3. Whether they own or rent their dwelling
  4. Value of their dwelling
  5. Whether they own a radio
  6. Whether they own a farm
  7. Sex
  8. Race
  9. Age
  10. Marital condition
  11. Age at first marriage
  12. Whether they are in school

The thing that caught by eye about this record is that it lists a, “Damon, Mary K” as William’s mother-in-law, widowed, age 73, living with them. Let’s see what we can find out about this woman. First step is to find her maiden name. A search for her marriage record in Abington failed, so we tried for Mary E. Damon’s birth record, which we did find in Abington’s birth register for in 1887 revealing her mother’s maiden name as, “Chessman”:

This then allows us to find Mary K. Chessman’s birth record, also in Abington, from 1856 listing her parents as Edward and Emily:

And then from here we can go back and find the family in the 1860 Federal Census:

We see the family as owning $500 in real estate and $100 in personal property, having 5 children, the oldest 8 years old. Mary K. is only 3.

But when I skip ahead to the 1870 Census, something is clearly wrong:

As you can see above, Emily is listed as head of household, and there is no Edward. And where is our Mary K? Age age 13, she has moved out and is working as a “domestic servant” with a family of factory workers. Her sister Harriet, age 15, is also living there and working in an “eyelet factory”:

So what happened? Resolving this mystery required a bit more sleuthing, but I eventually found the answer in a response to a records request to the National Archives and Records Administration (NARA):

From this I learned that Edward Blanchard Chessman, Mary K’s father, had served in the Civil War with the Massachusetts 32nd Volunteers and had died of disease in 1863 at a military hospital in Alexandria, Virginia. This along, with a dozen pages of additional documents from NARA, detailed the pension application of his widow, the depositions of witnesses who vouched for their marriage and his service, the periodic requests for pension increases, all the way to 1903 when Emily died and her pension file was closed, marked “DEAD” with a big, bold stamp.

Since I was now tipped off to the value of pension records, I next searched for Edward’s grandfather, Ziba Chessman, who I knew had served in the Revolutionary War. I was able to locate his widow’s pension application as well:

The hand of this writer is not so easy to read, but I’d transcribe the start of it as:

Commonwealth of Massachusetts. Norfolk County. On this twenty second day of August 1838 personally appeared before Herman **** The *** of Probate in **** County, Mehitable Chessman a resident in the Town of Braintree in the County of Norfolk and state of Massachusetts aged seventy three years, who being first duly sworn according to law doth on her oath make the following declaration in order to obtain the benefit of the provision made by the Act of Congress passed July 7th 1838 entitled “An Act Granting Half Pay and Pensions to Certain Widows”, that she is the widow of Ziba Chessman late of Braintree in the County of Norfolk and state aforementioned deceased, who was a Solider in the War of the Revolution; that her said husband Ziba Chessman enlisted into Captain Isaac Thayers or Captain Nathaniel Belchers Company in the year 1775 and served a short period of time as a private with the Massachusetts Militia, around the shores of Boston, according to the best of her knowledge….

I am in awe that these records have been maintained and preserved for so long, and made available to people like me who are researching their family tree. There is a continuity of records in New England that goes back almost 400 years. Birth, education records, draft registration, military service, marriage, court appearances and eventually death and burial. Whenever your personal life crossed paths with the government, it generated a record and this record may last forever, and more importantly, once the physical preservation aspects are taken care of, these records can be read forever.

A brief history of document technology

It is somewhat odd that we’ve been debating document formats for so long and have not really said what they are. I’ll recommend the following for our discussion:

A document format consists of the conventions that allow a document to be fixed in a persistent state and then exchanged with other parties who are able to use these same conventions to read and further edit that document. If you and I understand the same document format, then you and I can exchange documents in that format and we can collaborate using that format.

Since around 1450, with Gutenberg’s first notable success of combining document production and automation, and even before (and since) with manual document production, there has been a single globally relevant interoperable document format — ink on paper. Everyone could create it, everyone could read it, everyone could exchange it. It worked then and it works now.

Some noticeable advances in documents since 1450 include the invention of pre-printed forms, around 1850. These seem obvious now, but for many years we had what were called “formulary documents” which had boilerplate text which the clerk wrote out in full for each document, in addition to the customized language for each specific instance. You can get a sense of this from Ziba Chessman’s pension application quoted earlier. From an engineering perspective you can think of this as reuse of design, but not implementation.

Having a pre-printed form was a step forward in productivity, allowing a greater degree of reuse. The Surgeon General’s form shown above is an early example. Such forms were quickly associated with bureaucracy . In fact, the first written use of the word “form” in the English language (according to the Oxford English Dictionary) was this critical view of a 19th century government office:

The waiting-rooms of that Department soon began to be familiar with his presence, and he was generally ushered into them by its janitors much as a pickpocket might be shown into a police-office; the principal difference being that the object of the latter class of public business is to keep the pickpocket, while the Circumlocution object was to get rid of Clennam. However, he was resolved to stick to the Great Department; and so the work of form-filling, corresponding, minuting, memorandum-making, signing, counter-signing, counter-counter-signing, referring backwards and forwards, and referring sideways, crosswise, and zig-zag, recommenced — Dickens, Little Dorrit (1855)

The telegraph (1837) and teletype (1910) gave new, faster ways of moving documents around. Was Morse Code a new document format? Although the telegraph operators may have worked in Morse Code, the author of the document, and the person who ultimately received and read the document still worked with ink on paper.

The typewriter (1872) increase the speed and uniformity of personal document production. This also lead to a new use for carbon paper, an invention of 1806 originally created as an aid for the blind.

In the late 1880’s, Edison’s “Autographic Printing” was commercialized as the Mimeograph, giving a cheaper method of small batch document production.

Melvin Dewey (of Dewey Decimal fame) invents the hanging file folder (1893), leading to increased efficiency of document storage and retrieval.

The Harris Automatic Press Company is incorporated in 1895, ushering in the commercial use of offset printing and a 10-fold increase in document output rates.

The invention of the Soundex algorithm by Robert Russell of Pittsburgh in 1918 allowed more efficient searching of files and cards indexed by surnames, by grouping together names which were phonetically similar.

In 1924 radio facsimile allows pictures, as well as text, to be transmitted long distances.

In 1948 Xerography gave us document duplication without the use of wet, messy chemicals.

In 1969, IBM’s Charles Goldfarb, Ed Mosher and Ray Lorie invented GML, the Generalized Markup Language, the ancestor of SGML, HTML and XML.

The 1970’s saw the rise of the first computer-based word processors, including Wang’s Office Information System.

In 1974 Xerox PARC engineers create Bravo, the first WYSIWYG word processor.

In 1975, with the rise of office automation systems and early word processors, Business Week boldly proclaimed the “Paperless Office”.

At this point we reach an important fork in the road of history. What role would the computer and office automation mean for the future of documents? Does the paperless office become a reality? Or do we remain with paper-based documents? As Xerox PARC engineers were developing the world’s first WYSIWYG word processor, at the same time they were also developing a system for transporting documents electronically, from one computer to another. But this innovation was dropped because it went against Xerox’s core business, the creation and duplication of paper documents. So the choice was made. Paper still ruled. Paper consumption went up, not down. The word processor made it easier to produce more paper, faster. The paperless office did not happen, at least not yet. More first-hand details on this fascinating topic can be read in Sellen & Harper’s The Myth of the Paperless Office. In their words, “…paper became a surrogate for the network, enabling users with different machines to share documents…”.

And so we continued, for another 20 years, of WYSIWYG word processors, WordStar, MacWrite, Writing Assistant, Manuscript, WordPerfect, Word, WordPro, etc. We all created documents and hid the files away on our hard-drives in incompatible formats. When we needed to work with others we usually just printed out the document and exchanged the printout, using the 500-year old format of ink on paper.

Let’s pause here and make some observations.

First, note the areas of sustained and recurring innovation. These have been consistent throughout the past 500 years and reflect the ongoing nature and practical concerns of business communications:

  1. Document authoring
  2. Document duplication
  3. Document distribution
  4. Filling out of forms
  5. Submission of forms
  6. Processing of forms
  7. Storage and Retrieval of documents
  8. Authentication of documents (not mentioned in the history above, but the use of Notary Publics and corporate seals has facilitated this with ink and paper documents, in some forms back to ancient Rome.)

Note also that the engineering progress and increases in efficiencies in these areas occurred without challenging the primacy of a single document format. The universality of ink and paper did not stifle innovation over these 500 years. On the contrary a single standard document format encouraged and focused innovation. We went from documents authored by pen, then set in moveable type, manually pressed, bound and distributed at the speed of a horse, to where we were circa 1995, when I authored documents on a computer, printed to a laser printer and then queued up at the photocopier to make copies of my agenda before the meeting started. Ink on paper — it was the standard document format for 500 years.

But of course, we don’t work this way anymore. Something changed, very recently. I don’t print out agendas any more. I send them via email. I don’t print out reports and review them with a red pen in hand. I mark them up electronically. In fact, unless I need to sign it or staple a receipt to it, I don’t print out anything. I think I can live out the remainder of my professional career on only 2 reams of paper.

What happened then to change this? Why is there less of an emphasis on printed output today? What does this mean for WYSIWYG? And what does this mean for document formats?

These questions and others when I finish up this series in Part IV.


20 April 2007 — Another editing pass, tightening up the language, but still too long. Added link to “The Myth of the Paperless Office”.

Filed Under: Standards

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 49
  • Page 50
  • Page 51
  • Page 52
  • Page 53
  • Interim pages omitted …
  • Page 69
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies