Archives for March 2007

The ODF Validation Service

2007/03/28 By Rob 5 Comments

No, this has nothing to do with getting discounted parking if you use ODF, though that is an intriguing idea…

Daniel Carrera (OpenDocument Fellowship and the OASIS ODF TC) has a new blog and with it comes news of a new ODF tool, an ODF Validator Service, written as part of the Fellowship’s ODF Tools project by Alex Hudson.

It is in the spirit of the W3C’s Markup Validation Service: upload a document and get an instant report of whether or not it is valid ODF, and if not, what problems were found. I tried a few documents and it seems to work well.

It would be interesting to see if something like this could be made into a flexible framework for scanning ODF documents, at various levels. Think of a SAX-like call-back parser but at multiple levels of detail. So the framework knows how to fully parse an ODF document and identify features at the Zip and XML level. Plugins to the framework can subscribe to various parse events. So, maybe a ZipListener interface that simply has methods onFile() and onDirectory(). Then a ManifestListener interface that allows you to subscribe to notifications of the data in the manifest. Then within a document, like a spreadsheet, you could have listeners at the structural and content level, so onWorksheet(), onCell(), or in a Wordprocessor document, onTable(), onImage(), etc.

A framework like this could allow you to make a range of applications that need to scan an ODF document and take some action on it.

A validation service would operate at several levels, validating the structure of the Zip, the manifest as well as validating each of XML’s.
You could also do a cross-platform checker, looking embedded images, and other media, OLE links, etc., and reporting on whether any of these have platform dependencies.
An accessibility scanner would be able to fit into this framework as well.
A full text indexer could work here.
Any number of content scraping applications could work well here.
If there is some query language interface, this could be useful from a test-generation perspective. If you have a large collection of ODF documents, a developer working on a feature can instantly bring up a set of test documents that can be used to test the code he just changed. Give me a list of word processor documents that have Arabic Bidi text which also have tables. Give me a list of spreadsheets that use pie charts with more than 10 slices.
With the metadata framework coming in ODF 1.2, there will be even more interesting uses of such a framework.

The benefit of the framework is the reduction in code required to get directly to the info in the ODF document you want, without having to master the ODF specification or writing a lot of parsing code. Think of it as a framework for easy multi-level information extraction from ODF documents.

Change Log

4/11/2007 — Removed parenthetical comment about the need for a privacy policy, since one has now been added to the Validator page.

The Case for a Single Document Format: Part II

2007/03/22 By Rob 14 Comments

This is Part II of a four-part post.

In Part I we surveyed of a number of different problem domains, some that resulted in a single standard, some that resulted in multiple standards.

In this post, Part II, we’ll try to explain the forces that tend to unify or divide standards and hopefully make sense of what we saw in Part I.

In Part III we’ll look at the document formats in particular, how we got to the present point, and how and why historically there has always been but a single document format.

In Part IV, if needed, we’ll tie it all together and show why there should be, and will be, only a single open digital document format.

To make sense of the diversity of standardization behavior reviewed in Part I it is necessary to consider the range of benefits that standards bring. Although few standards bring all of these benefits, most will bring one or more.

Variety Reduction

Standards for screw sizes, wire gauges, paper sizes and shoe sizes are examples of “variety-reducing standards”. In order to encourage economies of scale and the resulting lower costs to producers and consumers, goods that may naturally have had a continuum of allowed properties are discretized into a smaller number of varieties that will be good-enough for most purposes.

For example, my feet may naturally fit best in size 9.3572 shoes. But I do not see that size on the shelves. I see only shoes in half-size increments. Certainly I could order custom-made shoes to fit my feet exactly, but this would be rather expensive. So, accepting that the manufacturing, distribution and retail aspects of the footwear industry cannot stock 1,000’s of different shoe sizes and still sell at a price that I can afford, I buy the most comfortable standard size, usually men’s size 9.5.

And yes, Virginia, there is also an ISO Standard for shoe sizes, called ISO 9407:1991 “Mondopoint”.

Decreased Information Asymmetry

A key premise of an efficient & free market is the existence of voluntary sellers and voluntary buyers motivated by self-interest in the presence of perfect information. But the real marketplace often does not work that way. In many cases there is an asymmetry of information which hurts the consumer, as well as the seller.

For example, when you buy a box of breakfast cereal at the supermarket, what do you know about it? You cannot open the box and sample it. You cannot remove a portion of the cereal, bring it to a lab and test it for the presence of nuts or measure the amount of fiber contained in it. The box is sealed and the contents invisible. All you can do is hold and shake the box.

The disadvantage to the consumer from this information asymmetry is obvious. But the manufacturer suffers as well. This stems from the difficulty of charging a premium for special-grade products if this higher grade cannot be verified by the consumer prior to purchase. How can you sell low-fat or high-fiber or all-natural or low-carb foods and charge more for those benefits, if anyone can slap that label on their box?

The government-mandated food ingredient and nutritional labels solves the problem. The supermarket is full of standards like this, from standardized grades of eggs, meat, produce, olive oil, wine, etc. There are voluntary standards as well, like organic food labeling standards, that fulfill a similar purpose.

Compatibility

Compatibility standards, also called interface standards, provide a common technical specification which can be shared by multiple producers to achieve interoperability. In some cases, these standards are mandated by the government. For example, if you want to ship a letter using First Class postage, you must adhere to certain size and shape restrictions on the letter. If you want to to send many letters at once, using the reduced bulk rate, then you must follow additional constraints on how the letters are addressed and sorted. If you want to deal with the Post Office, then these are the standards you must follow.

Similarly, if you are a software developer and you want to write an application that does electronic tax submissions, then you most follow the data definitions and protocols defined by the IRS.

Required interface standards are quite common when dealing with the government. Regulations requiring the use of specific standards also promote public safety, health and environmental protection.

And not just government. A sufficiently dominant company in an industry, a WalMart, an Amazon or an eBay, can often define and mandate the use of specific standards by their suppliers. If you want to do business with WalMart, then you must play by their rules.

Network Goods

Where it gets interesting is when compatibility standards combine with the network effect. I’m sure many of you are familiar with the network effect, but bear with me as I review.

The first person to have a telephone received little immediate value from it. All Mr. Bell could do was call Mr. Watson and tell him to come over. But the value of the telephone grew as each new subscriber was connected to the network, since there were now more people who could be contacted. Each new user brought value to all users, present and future. When the value of a technology increases when more people use it, then you have a network effect.

In a classic, maximally-connected network, like the telephone system, when you double the number of subscribers, you double the value to each user. This also causes the value of the entire network — the total value to all subscribers — to square. So double the number of participants in the network, and the value of the network goes up four-fold.

Of course, this only works up to a point. There are diminishing returns. When the last rural villager in Albania gets a telephone connection, I personally will not notice any incremental benefit. But when we’re talking about the initial growth period of the technology, then the above rule is roughly the behavior we see.

Other familiar network effect technologies include the Internet’s technical infrastructure (TCP/IP, DNS, etc.), eBay, Second Life, social networking sites such as Flickr, del.icio.us or Digg, etc.

If we delve deeper we can talk about two types of network effects: direct and indirect. The direct effect, as described above, is the increased value you receive in using the system as greater numbers of other people also use the system. The indirect effects are the supply-side effects, caused by things like increased choice in vendors, increased choice in after-market options and repairs, increased cost efficiencies and economies of scale by a market that can optimize production around a single standard.

So take the example of eBay. The direct network effect is clear. The more people that use it, the more buyers and sellers are present, and the more value there is to all of the buyers and sellers. The indirect network effect is the number of 3rd party tools for listing auctions, processing sales, watching for wanted items, sniping, etc., which are available because of the concentrated attention on this one online auction site.

It might be helpful to look at this graphically. The following chart attempts to show two things:

How the average per-user cost of using the technology C(N) decreases as more people join the network.
How the average per-user utility (value) U(N) increases as more people join the network.

A few things to note:

First, utility does not increase without limit and cost does not decrease without limit. There will be diminishing returns to both. Remember that last villager in Albania.

Also, note that initially the average cost is more than the average utility. But this is only the average. Not everyone’s utility function is the same. If they were, then network would never get started. Fortunately, there is a diversity of utility functions. Some users will see more initial value than others, and they will be the early adopters. Some will see far less value than others and they will be the late adopters.

Finally note the point marked as the “tipping point”. This is where the largest growth occurs, when the average user’s utility is greater than the average users’ cost.

Network Effect Compatibility Standards

So what does this all have to do with standards? My observation is that a single standard in a domain naturally results when there are strong direct and indirect network effects. And where these network effects do not exist, or are weak, then multiple standards flourish.

This can be seen as societal value maximization. A network of N-participants has a total value proportionate to N-squared. Split this into two equally-sized incompatible networks and the value is 2*(N/2)^2 or (N^2)/2. The maximal value comes only with a single network governed by a single standard.

Allowing two different networks to interoperate may be technically possible via bridging, adapting or converting, but this at best preserves the direct network effects only. The indirect effects, the economies of scale, the choice of multiple vendors, the 3rd party after-market options, etc., these reach their maximum value with a single network. The indirect network benefits essentially follow from the industry concentrating their attention and effort around a single standard. When split into multiple networks, the industry instead concentrates their attention on adapters, bridges and convertors, which requires effort and expense on their part, with the cost eventually passed on to the consumer, although it brings the consumer no net benefit over having a single network.

The Cases from Part I

Let’s finish by reviewing the cases presented in Part I, in light of the above analysis, to see if those examples make more sense now.

Railroad gauge — This is clearly a network compatibility standard, with strong direct and indirect effects. When everyone uses the same gauge, travelers and goods can travel to more places, faster and at less cost. The indirect effect is that it allows the train manufacturer to concentrate on producing a train that fits a single gauge. As this happens the train companies have a greater choice of whom they can buy from. Everyone wins.
Standard Time — This is more subtle, but it is also a network effect standard. The more people who use Standard Time, the easier it was to communicate times unambiguously and without error to others who were also using Standard Time. There is also an aspect of variety-reduction to this, where having fewer local times to worry about simplified the train time tables which made it easier for passengers and shippers or interacted with the trains.
The single language for civil aeronautics. This is variety-reduction, a mandated safety standard, as well as a networked compatibility standard, where the network consists of pilots and control towers.
Beverage can diameters — This is a variety-reducing standard. There is no network effect. Ask yourself, when you buy a can of Coke, does it bring more value to others who have also bought a can of Coke? No, it doesn’t.
TV signals — Clearly this is a network compatibility standard, with strong direct and indirect effects. The network is not just of the viewers of TV. It also includes the networks, the local affiliates, and the companies that manufacture the hardware and software, from antennas and transmitters, to camera, editing software, televisions and VCR’s.
The complexity of the above network is one reason why the government has stepped in to mandate the switch to digital television. (The other reason is the money they will get from auctioning off the radio spectrum this conversion will free up) The free market is good at many things, but the complex conversion of an entire network of diverse and competing producers and consumers at many levels is not something it has the agility to accomplish.
Fire hose couplings — This started as a compatibility standard, but only at a local level. Baltimore had its own standard for its own fire company. However, as the railroad made it practical to transport fire companies from more distant cities, a larger network developed. By using the national standard hose coupling, you not only can now receive mutual assistance from other fire companies (direct value) you also have a greater choice of whom you can buy fire hoses from (indirect value), and fire hose manufacturers now have a larger market they can sell into (indirect value) and the concentration on a single coupling design (variety-reduction) will lead to manufacturing efficiencies and economies of scale (indirect value), as well as concentrated innovation around that standard (indirect value).
Safety razors — There is no network effect with razors and razor blades. The value I get from using Gillette does not vary depending on how many other people use Gillette. I would get the same shave if I were the only one using it, as if the entire world used it.
Video game consoles — These generally have been free of direct network effects, though there are clearly some indirect ones, in terms of varieties of titles, after-market accessories, etc. The interesting thing to watch will be to see whether the latest generation of game systems, the ones that allow play over the Internet, will lead to direct network benefits. Will this lead to standards in this area?
SLR lens mounts, DVD disc standards, coffee filters, vacuum cleaner bags, etc. — These are all similar, compatibility standards with no direct network effects.

Well, this is too long already, so I’ll stop here.

In Part III I’ll look at the history of document formats, and see what factors have influenced their standardization. Some questions to think about until then:

Some technologies, like rail gauges, local time or fire hose couplings went many years without standardization. Then, in a brief surge of activity, they were standardized. Look at the trends or events that participated the need for standardization. Is there any unifying logic to why these changes occurred? Hint, there is something here more general than just the trains.
In the cellular phone industry, Europe and Asia made an early decision to standardize on the GSM network, while the U.S. market fragmented between CDMA, GSM and, earlier, D-AMPS. What effects does this have on the American versus the European consumer, direct and indirect?
Microsoft has repeatedly stated that they are dead-against government mandates of specific standards. But they are a member of the HighTech Digital TV Coalition, an organization which is heavily lobbying the government to mandate Digital TV standards. How do we reconcile these two positions? Are they only against mandatory standards in areas where they have a monopoly?
How does any of this relate to office document formats?

In Part III, we’ll look at that last question in particular, including an illustrated review of the history of document formats.

3/23/07 — Corrections: Bell not Edison invented the telephone (Doh!). Also corrected calculation in value of two networks.

Cannibalism

2007/03/20 By Rob 16 Comments

A interesting post by Bob Sutor. What is OOXML’s real competition, and how does that help ODF? The dynamics get interesting when you are hindered by your own install base. The main selling point of OOXML is its claimed 100% compatibility with the legacy binary formats. But if you are using Office 2000, and happy with it, what is the reason to move to OOXML? Why not remain using the binary formats? What justifies the migration?

The downside is clear. The minute you move to OOXML you have less choice with whom you can successfully exchange documents with. Office for the Mac, Windows Mobile, WordPerfect Office, Google Docs and Spreadsheets, SmartSuite, ThinkFree Office, users of these products, and the numerous 3rd party applications that can read and write the binary formats, these are now outside of the universe of people and applications that you can exchange documents with. Despite some early attempts from Sun and Novell, Linux users are left out as well.

So why move to OOXML? From the CTO’s perspective, if your greatest concern is legacy compatibility, what is the ROI argument for changing file formats? Wouldn’t the tendency be to remain where you are?

So the breakdown may happen like this:

N% of companies put compatibility with legacy documents foremost. A% of these stay on Office/Windows and upgrade to Office 2007/OOXML. B% stay where they are and use the binary formats, and C% move to some combination of ODF and PDF.
100-N% make a decision primarily on factors other than 100% fidelity with legacy documents, such as ease of programmability, greater choice and diversity in applications and vendors, etc. X% stay on Office/Windows and upgrade to Office 2007/OOXML. Y% stay where they are and use the binary formats, and Z% move to some combination of ODF and PDF.

I think that B & Z may be the dominating factors. N is large now because it includes the inertial effects of Microsoft’s market dominance. Even companies that don’t make an explicit choice will end up with that path by default. But even the most passive company will not fall into choice A without some thought.

It is interesting to speculate on the initial percentages. But note that this is a network effect game, so the percentages will vary over time based on expectations.

Pruning Raspberries

2007/03/19 By Rob Leave a Comment

The earth does not yield up her sweet fruits unrecompensed. For every berry I will harvest in September, I pay now an equal measure of sweat and blood. Hunched down and with thick gloves, I navigate the thicket of thorny bramble canes, the red raspberries, yellow raspberries, purple raspberries, blackberries, thimble berries and field berries, and restore man’s order to nature’s chaos.

The correct way to prune brambles depends on their variety, whether they are primocane-bearing, or floricane-bearing. Many raspberries, and all blackberries, are floricane-bearing, meaning they have a two-year cycle, where the canes that grow this year (the primocanes) will flower and bear fruit next year (when they will be called floricanes). The primocane-bearing varieties, on the other hand, bear fruit on this year’s canes. I like having a mix, since that spreads out the harvest.

The floricane-bearing varieties, since they started their growth last year, will bear fruit in the summer, while the primocane-bearing varieties, which need to complete their growth in a single year, will bear fruit later, in the fall. Primocane-bearing varieties are cut to the ground after harvest. The maintenance of floricane-bearing varieties is a little more complicated. The floricanes are removed after harvest, and the primocanes, which will be next year’s floricanes, are pruned and thinned while the plant in dormancy, late winter, which is the work I was able to complete before this last snowstorm.

Pruning of brambles will consider several factors:

The architecture of the plant. A bush full of large berries will have considerable weight. One option is to trellis the plants to support that weight. Another option, which I prefer, is to maintain the canes and side branches at a length where the plant can be self-supporting, 4-5 feet tall, side branches trimmed to 8-12 inches.
Cane density. It is better to have 3-5 thick, strong canes per linear foot than to have 15 smaller ones. The goal in the end is to have a bounty of fruit, not foliage. So now is the time to thin the canes.
Access for sun, rain, air and me. This is another reason to thin the canes. A big dense mass of canes competing for limited resources will produce poorly, be susceptible to mold, and will be difficult to harvest.

The question can fairly be asked, “Why go through all this trouble? Why not let the invisible hand of nature guide the development of the brambles? Let her decide. She will pick the winners and losers.”

To that I respond, that nature, in her infinite wisdom, does not seem to care much for bringing me berries. I am not absolutely certain what role brambles play in the grand scheme of things, but if I had to guess, nature likes them to form wild, uncontrolled, dense masses of thorny canes, with berries inaccessible to larger mammals. That seems to be their natural tendency in my garden. However, in their natural state, the brambles thicket forms an ideal protective habitat for small birds, who can remain protected from predators while eating the berries. The berry seeds survive unharmed by the digestive system of the birds and are excreted, with fertilizer, in distant locations, leading to the better propagation of the species.

And so I battle the genes inside the berries, pitting my labors against nature’s disordered fecundity. It breaks the back and scrapes the skin, but it must be done again each year, around this time.

ODF Freely Available

2007/03/19 By Rob 1 Comment

Another step forward for ODF. After gaining ISO approval in May, and Publication status in December, ISO/IEC 26300 is now counted among ISO’s “Freely Available Standards“. What is the significance of this? The text is identical to what it was in May, but you no longer need to pay 342 Swiss Francs to ISO to download an official copy. It is now free. Enjoy!