≡ Menu

A Tale of Two Formats

As he stood staring at them, they asked him no questions, for his face told them everything.

‘I cannot find it,’ said he, ‘and I must have it. Where is it?’

His head and throat were bare, and, as he spoke with a helpless look straying all around, he took his coat off, and let it drop on the floor.

‘Where is my bench? I’ve been looking everywhere for my bench, and I can’t find it. What have they done with my work? Time presses: I must finish those shoes.’

They looked at one another, and their hearts died within them.

Charles Dickens, a careful student of human nature, provides us here a vivid portrait of Dr. Alexandre Manette, who, after being held 18 years in the Bastille, is released, but is unable to adjust to his new freedom, and in times of stress lapses back to the familiarity of his prison labors, making shoes.

We all have been prisoners of Microsoft Office and their proprietary file formats. You may no longer recognize it as a prison, because this cell has been your home for the past 15 years, but here is what it looks like:

  1. Editing a document requires Microsoft Office.
  2. Since Office runs only on Windows, you also require Windows
  3. These restrictions lead to a purely heavy-client view of document processing.
  4. This also leads to a model of programmability that emphasizes storing executable code (macros/script) inside of the document, resulting in years of security nightmares. Here is a typical recital of the known dangers.
  5. If you don’t want to put script inside your document, you could access the data via Office automation API’s, but this again required a machine running Windows and Office.
  6. It also emphasizes a view of WYSIWYG which emphasizes early formatting and layout decisions and de-emphasized semantic richness in documents. For example, see “What has WYSIWYG Done to Us?”.
  7. The tools that were created for us to record our thoughts instead now constrain or even substitute for our thoughts. For example, “PowerPoint Panders to our Weaker Points” in the Guardian, and Tufte’s “PowerPoint is Evil”.
  8. The above also lead to a stifled the market for 3rd party document processing tools. We will never see the value of what was never allowed to occur, but the opportunity cost of the innovation that did not happen in this single-vendor world is enormous.
  9. This also lead to general lack of competition in the productivity editor market, leading to a decade of buggy products with little innovation. Is the “Ribbon” the most we can look forward to?
  10. We’ve been locked into a one-size-fits-all offerings of bloated applications. Many people are over-served by Office and therefor are over-paying for functionality they do not need, while others are under-served by the resulting products they cannot afford.
  11. Functionality has been arbitrarily segregated into three and only three application classes, “Spreadsheet”, “Word Processor” and “Presentation Graphics”.

The move from proprietary binary formats to new standard formats, like OpenDocument Format (ODF), is a movement from imprisonment to freedom. The technical constraints have been lifted, but have we really made the mental adjustments necessary to engage our new freedom? Or are we still silently pacing a 10-foot cell in our minds? If we merely recreate our cell walls in XML, then we are still prisoners.

I am a creature of habit and have been as much a prisoner as you have, so don’t look to me for all the answers. But I do have a few thoughts on what this new freedom might look like.

Instead of being opaque black boxes that can only be used on one vendor’s system, documents will be transparent. Anyone can access them using whatever operating system and whatever tools they want, and for any purpose they want. Python on Linux, REXX on AS/400, and C# on Windows will all have equal opportunity.

This also implies that document processing will no longer be restricted, technically or by license, to the desktop. Innovative things will occur on servers. We’re starting to see some of that with Google Docs and wikiCalc. But that is only the beginning. We will see search engines that can intelligently search content for specific MathML expressions, spiders that will collect and aggregate slides from presentations and allow you to share them, document repositories that will automatically check citations in papers and calculate the intellectual social networks these imply, stock brokers that will allow you to download your statements formatted in a spreadsheet, with additional analytics calculated via spreadsheet formulas. Creating, editing, reading, viewing, storing, collaborating will be able to be done anywhere, from your cellphone to the largest servers.

Since the server typically has access not only to your own documents, but your organization’s as well, as well as easy access to other information about the users, such as your role and group via LDAP, an application can drive workflows that relate the contents of the document to similar content, as well as to you organizational role, and to your business. The companies that unlock the knowledge stored by your knowledge workers in your organization’s documents will be the companies leading us into the next decade.

The old walls will fall that once segregated functionality into the arbitrarily defined boundaries of “Spreadsheet”, “Word processor”, and “Presentation graphics”. Dan Bricklin is leading the way with his wikiCalc. Is it a Spreadsheet or is it a Wiki? If you have to ask the question then you are still a prisoner. The point is wikiCalc is whatever Dan Bricklin wants it to be. That is freedom to innovate. We will see the arbitrary divisions between application genres become fuzzy and fall away as we all recognize our new freedom.

Document programmability will be turned inside-out. Instead of putting code inside of the document, turning documents into virus vectors, the code will be carefully segregated. Once the code and the data are distinct, we can put the code on the server, where it can be more easily managed, maintained, and secured. This clean separation of code and data will be as important to system stability and security as was protected-mode in the 80286 processor when it first enforced this data/code separation at operating system level. I see macro viruses becoming a thing of the past, like smallpox, because the importance of data/code separation will finally be enforced, and users will not be emailing around code disguised in documents.

We will start thinking of documents as data, and as inputs to modules that process data. I see visual design tools that will allow you to drag and drop a document template onto a design surface and expose various fields in the document which can be wired up to databases, web services or other data sources.

I see financial analysts creating financial models in spreadsheets, then converting the spreadsheet into a web application that can then be deployed anywhere to provide browser-based access and execution of the model via any browser.

I see a variety of productivity editors available at a variety of price points, from free, open source ones, to commercial offerings for desktop and other devices, to specialized offerings with extra features for vertical markets, like legal, medical, academic, or scientific uses.

I see an escape from documents-as-pictures, where users sweat over pixel-perfection and pray that the applications don’t screw them up. Today the end user doesn’t worry about font kerning. We rely on the font managers to get this right, and we accept the results, and concentrate on what we, the authors, add to the document. We are freed from that mental burden of kerning. But why stop there? With smarter applications, we will be freed of most or all formatting burdens. We will concentrate on writing, not on styling, and rely on the applications to get the appearance right. This will free our time to give an increased emphasis on semantic richness, putting our knowledge and experience and outlooks and opinions into the document, and encoding it in an way that allows new modes of collaboration and redefines what a document is.

That is a gimpse at what freedom looks like to me. But let’s not forget that being freed is not the same as being free. There are those out there who are attempting to merely recreate the same single-vendor closed system we’ve had for the past 10 years, and recoding it in XML. This may be a comfortable choice to those who have known no other way. But is it really freedom? I look out and see the jailer offering to sell 10-foot apartments to those just released from their 10-foot prison cells. Will you follow?

Change Log

1/30/2007 — updated wikiCalc link, made other assorted wording changes at my whim, corrected a spelling error, changed to curly quotes.

{ 16 comments… add one }
  • Anonymous 2006/08/26, 5:30 pm

    Most scenarios are already possible. e.g. extracting of references is done by e.g. citeseer automatically from PDF-files.

    This freedom and prison analogies remind me of the fact that I’m prisoner to our electric system and should be freed from this by going back to living in the woods. I mean Microsoft Office is everywhere and the fileformat is ubiquous and accepted as a fact. And the new XML-version serves not only Microsofts customers best, but all of the common users who want their documents to “just work”: No broken nested tables, mixed up fonts and all the other stuff.

    And it’s not like the existing binary fileformats are totally opaque: Google indexes them, OpenOffice and most other word processing software (et cetera) can deal with that format.

    These formats simply work and there is simply no reason for us (all normal users, most windows users, most people for whom the import/export of OpenOffice is just as good as it is possible) to care about your point of view and dreams about everybody using this pseudo-open OpenOffice file format.

    And remember: The problem is not the openess of the file format or the usage of XML. The problem is that for a perfect compatibility between OpenOffice, Ms Office, Kword, etc. these programms would need to have almost the same set of features! If any one of them implements something new like shadows or bibliography, then all others need to follow or there would be no round-tripping.

    It’s a shame to just take the OpenOffice-fileformat which is specific to OpenOffice and declare it as standard. This is unusable by programmers

  • Rob 2006/08/27, 2:40 pm

    I think a little reflection (just a little) will show you the errors of your thinking regarding document compatibility.

    Microsoft has offered upgrades to Office over the years but maintained the same binary formats. Although they have added features in Office 2000, Office XP, Office 2003, etc, that has not prevented document exchange between, say Office 2003 and Office 97. By your logic, the Microsoft formats which add new features, including Office 12, are all unusable because they can encode things that older versions of Office cannot understand. That isn’t what you really mean, is it?

    Certainly, questions of forwards and backwards compatibility are issues that require some thought. But this issue already exists whenever one thinks of upgrading MS Office or collaborating with others who use a different version of Office.

    Certainly having everyone in the world run Office 12 on Vista would solve all compatibility problems and remove the inconvenince of thinking about the problem further. This would work in the same way that a dicatatorship removes the inconvenience of opposition parties, free speech and elections.

  • Anonymous 2006/08/28, 9:35 am

    Certainly having everyone in the world using only ODF would solve all compatibility problems and remove the inconvenince of thinking about the problem further. This would work in the same way that a dicatatorship removes the inconvenience of opposition parties, free speech and elections.

  • Rob 2006/08/28, 11:11 am

    I think you are confusing standards with applications, an easy thing if you are not careful.

    For example, in the United States we have a thriving market with plenty of competition for automobiles, both foreign and domestic. These vehicles are available at a variety of price points, in many sizes, styles, colors, etc. This is a free market, not a dictatorship. But we also all drive on the same side of the street, and have standards for airbags, safety belts, fuel emissions, and other things.

    Similarly, we have healthy competition in telephone service, and telephone hardware, though we all agree on the underlying protocols for telephony.

    As we have all seen, agreement on standards encourages competition, leads to a healthy free market with innovation, increased customer choice, value and freedom. These are things we do not have today with the Office monopoly.

    I believe that it is far better to have a single standard with a market of several competing implementations than to have several incompatible standards. To do otherwise is to continue down the road to fragmentation and vendor lock-in. I’m not saying that there are not those who would prefer that outcome, but they should be honest about who they are, and what interests they represent.

  • Anonymous 2006/09/04, 6:27 am

    Actually, Rob, to add further detail to your analogy, you could mention the IBM S/390 computer series, which made a general market for computers where before there had only been highly-specialized markets and the resultant highly-specialized skill-sets.

  • Anonymous 2006/09/10, 1:21 pm

    Rob, ever tried to buy 30$ skoda headlamps and try to mount them into a bmw ? Probably not because cars aren’t compatible. They are made to satify the needs of groups of customers and only parts that need regular changing or are mass produced are interchangeble. Fuel, oil, tyres but nothing that makes a car a certain type like bodyparts, interior design engines and so on.
    Don’t use cars as an example for compatibility because car compatibility analogies moved to Office suites would mean you could just exchange some ascii without formatting or styles as those specific functions to each suites.

  • Rob 2006/09/10, 3:03 pm

    Sorry, never had a BMW.

    In any case, it was an analogy, and any analogy can be stretched too far, as you have amply demonstrated.

    In the end, we all benefit from competition in goods rather the confusion over multiple standards.

    Have you ever had to deal with DVD-R, DVD-RW, DVD+RW and DVD-RAM? Who benefited from proliferation of standards? Who lost?

  • ray 2006/09/11, 6:15 pm

    Rob, you cannot simply rewrite history to suit your own ideology and throw away the last 10 years that have brought us to where we are now.

    History is a progression – we would not be where we are now if things had moved in a different direction 10 years ago. Some things would be better, others a lot worse.

    10 years ago the ubiquitous Internet (in the developed world) did not exist. 10 years ago most applications were proprietary black holes that your data disappeared into and you did not care.

    It is only since the infrastructure has been put in place to enable the wide exchange of data that it has become necessary to unlock this data for sharing and to enable all the scenarios you envision.

    Personally, I can’t wait to see this vision of yours happening.

    However, your single-minded determination to blame one company for holding back this utopia diminishes everything else you say.

    Please open your mind to the fact that your one true way might not be the only way to achieve your vision.

  • Anonymous 2006/09/14, 5:23 am

    Microsoft is all very well if your some Noddy dumb ass user who doesn’t mind being locked (aka imprisoned) into Microsoft’s insecure, bloated, buggy operating system and tools, and creativity is something someone else does.

    Creativity and invention rely on freedom of choice. Hey ! Look at me I’m institutionalized, I love it, the porridge is great, besides who needs real food, and drinking water it is healthy. Let’s scratch a doodle on our cell wall and communicate our creative freedom.

    If you have ever tried to do something meaningful with the information in a Microsoft document, particularly with none Microsoft tools .. well get your tin plate and cup out friend, there is not much on the the menu. The amazing fact is Microsoft demand you to pay for these privileges, not for too much longer, the writing is on the walls.

  • Rob 2006/09/15, 3:37 pm


    Thanks for the comments, though I do disagree. It is perfectly legitimate to examine the history of this industry and ask what the singificance of various past decisions and circumstances were, and what alternatives would have brought. To believe that the present situation is the best of all possible worlds, or to believe that past choices had no consequences — these are both dismal views of our ability and responsibility to influence events, a view which I utterly reject.

    In any case, my post was clearly future-focused. I’m not going to dwell on what happened before or why it happened. I just want to make it clear that the status quo today is far from a consensus, that we have choices today that we did not have previously, and these choices can lead us to a future that is far better than today’s present.


  • Anonymous 2006/09/18, 9:31 am

    Rob, I do not see your employer IBM opening up de spec/protocols of for instance IBM’s websphere so that others can build OSS communication software to replace it.
    For MS opening up a propriety format like they have to an open format is a big step and it is certainly a lot more than any other large software company has done so far.
    I think the development of ODF has made MS make this step. However I do not think that any of the current ODF implementations holds it’s own to the MS Office suite which is just a lot lot better. And now with an open format as it’s native format I cannot see valid reasons to switch to another suite. I think some suites like KOffice could definitly have a future as they improved a lot over the last couple of years and probably will improve a lot more but for the coming 3 years I cannot see a value in switching whatsoever

  • Rob 2006/09/18, 1:31 pm

    Did you have some specific protocol in mind with the WebSphere comment? Keep in mind that “WebSphere” is a brand, not a single product, and encompasses 30+ different offerings. If you dig into them, you find broad standards support, including UDDI, SOAP, HTTP, SSL, J2EE, WSDL, JSP, EJB, JDBC, Java Servlets, JMS, JTS, etc. In many cases IBM was a creater or co-creater of the standard.

    You say, “I do not think that any of the current ODF implementations holds it’s own to the MS Office suite which is just a lot lot better”. I’ll accept that as your personal view. But I think you will agree that not everyone evaluates an office suite with the same criteria. For some, price is the main issue, for others the use of standards is the issue, for others the feature set is the issue, for others ease of use is the issue, for others disk and memory footprint is the issue, for others support for non-Windows OS’s is the issue, for others the freedom and ability to hack the source code is the issue. With 61 million downloads, there is clearly a sizable minority who have evaluated the alternatives and chosen OpenOffice. In other words, “a lot lot better” can only be said for a particular value set. From another, equally valid set of requirements, weighing and comparing the factors OpenOffice is clearly a lot lot better.

    There are clear economic alternatives here. Would you give up your copy of MS Office in exchange for OpenOffice if I gave you $1? Maybe not. What if I gave you $1,000,000? Most likely yes. No one is a purist here. Everyone is just haggling over the price.

    Assuming OpenOffice is sufficient to do the task (and for many/most tasks I’ll claim that it is), when the cost to switch an organization to OpenOffice (TCO, including training, deployment, etc.) is less than the cost to upgrade to the next version of Microsoft Office, then the landslide begins. I’d look at the Office 2007 upgrade numbers one year from now as a proxy for what the market considers the relative values of the two suites to be.

  • hAl 2006/09/19, 8:40 am

    I would consider KOffice a bigger competitor to MS Office for the future than OOo. KOffice does not have the burden of a lot of legacy documents and can develop new features fairly fast.
    however I think it will take KOffice at least two more years mayby even more to have there suite on a competative level. that isn’t a problem as the ODF format also needs to grow up and Koffice can grow alongside.
    A problem can be that OOo could stifle the progress of the format especially if fastgrowing OSS tools might need more featrues than ODF currently supports. big legacy suites like OOo and StarOffice that have so much control over the format then might be more of a ballast then an advantage.

  • Rob 2006/09/19, 10:03 am

    I’m not sure feature parity with MS Office should be the goal for OSS alternatives. That is too much like the prisoner having his highest aspirations to be just like his jailor, not realizing that the jailor is also a prisoner.

    In the end, competition is about meeting customer needs at the best price, not about achieving an arbitrary feature level defined by Microsoft.

    In 1980’s and early 1990’s this was about adding features. But today I’m not so sure. We now see people moving to tools like wikis for collaborative editing, tools with fewer editing features (a big step backwards in fact) but increased ability for multi-user collaboration.

    I’d rather see some fresh thinking around what users really need. An Office suite that was designed in the pre-internet days, for a market filled mainly with computer novices might not be the right model for a connected, mobile world of users who have been using computers since age five. So people are willingly taking a step backwards in terms of control over the visual appearance of their documents. Part of this is increased collaboration. But part of it is also increased consistency of appearence. If the user has less control over the text attributes, and styling is deferred to post-procesing, then wonderful things can happen, like easily reusing the same content for different devices, different modes of delivery (voice, screen readers, etc.)

    WYSIWYG ties us to the printed page metaphor on screen devices with far less resolution. This choice was made very early in the development of the word processor, in a time before connected computers, where document collaboration required the exchange of printed output. But is WYSIWYG still as critical today?

    Power for the user is not what you allow the user to do. Power is in what you allow the user to ignore. The user doesn’t worry about font kerning and instead leaves that to the font manager and rasterizer. This is good. Similarly, the typical user should not worry about layout and text attributes. The powerful tool lets the user concentrate on what they are saying, free from distractions.

    I disagree with you on the “control of the format” comment. If you check the meeting attendence of the ODF TC, and look at the members who have voting rights, you will see that neither Sun nor IBM controls the TC. The current voting mmebership consists of: IBM(2), Sun(2), Intel(1), ISO(1), KDE(1), Open Document Foundation(3), Individual(1).

  • James 2006/11/14, 6:52 pm

    2. Since Office runs only on Windows, you also require Windows

    Or OS X. Macs run office quite well, thank you.

  • Ben Langhinrichs 2007/01/30, 5:37 pm

    An excellent post, as usual, Rob. While I might quibble about some of the specific points, I think the bigger message that we could open up the applications market to something more than one fixed (and small) set of applications is very exciting. I look forward to seeing wikiCalc and other innovations in this area, and am looking at office documents with a new eye to see how they could be created/accessed differently.

Leave a Reply

Next post:

Previous post: