Wednesday, March 28, 2007
The ODF Validation Service
Daniel Carrera (OpenDocument Fellowship and the OASIS ODF TC) has a new blog and with it comes news of a new ODF tool, an ODF Validator Service, written as part of the Fellowship's ODF Tools project by Alex Hudson.
It is in the spirit of the W3C's Markup Validation Service: upload a document and get an instant report of whether or not it is valid ODF, and if not, what problems were found. I tried a few documents and it seems to work well.
It would be interesting to see if something like this could be made into a flexible framework for scanning ODF documents, at various levels. Think of a SAX-like call-back parser but at multiple levels of detail. So the framework knows how to fully parse an ODF document and identify features at the Zip and XML level. Plugins to the framework can subscribe to various parse events. So, maybe a ZipListener interface that simply has methods onFile() and onDirectory(). Then a ManifestListener interface that allows you to subscribe to notifications of the data in the manifest. Then within a document, like a spreadsheet, you could have listeners at the structural and content level, so onWorksheet(), onCell(), or in a Wordprocessor document, onTable(), onImage(), etc.
A framework like this could allow you to make a range of applications that need to scan an ODF document and take some action on it.
- A validation service would operate at several levels, validating the structure of the Zip, the manifest as well as validating each of XML's.
- You could also do a cross-platform checker, looking embedded images, and other media, OLE links, etc., and reporting on whether any of these have platform dependencies.
- An accessibility scanner would be able to fit into this framework as well.
- A full text indexer could work here.
- Any number of content scraping applications could work well here.
- If there is some query language interface, this could be useful from a test-generation perspective. If you have a large collection of ODF documents, a developer working on a feature can instantly bring up a set of test documents that can be used to test the code he just changed. Give me a list of word processor documents that have Arabic Bidi text which also have tables. Give me a list of spreadsheets that use pie charts with more than 10 slices.
- With the metadata framework coming in ODF 1.2, there will be even more interesting uses of such a framework.
The benefit of the framework is the reduction in code required to get directly to the info in the ODF document you want, without having to master the ODF specification or writing a lot of parsing code. Think of it as a framework for easy multi-level information extraction from ODF documents.
Change Log
4/11/2007 — Removed parenthetical comment about the need for a privacy policy, since one has now been added to the Validator page.
Labels: ODF
Friday, March 23, 2007
The Case for a Single Document Format: Part II
In Part I we surveyed of a number of different problem domains, some that resulted in a single standard, some that resulted in multiple standards.
In this post, Part II, we'll try to explain the forces that tend to unify or divide standards and hopefully make sense of what we saw in Part I.
In Part III we'll look at the document formats in particular, how we got to the present point, and how and why historically there has always been but a single document format.
In Part IV, if needed, we'll tie it all together and show why there should be, and will be, only a single open digital document format.
To make sense of the diversity of standardization behavior reviewed in Part I it is necessary to consider the range of benefits that standards bring. Although few standards bring all of these benefits, most will bring one or more.
Variety Reduction
Standards for screw sizes, wire gauges, paper sizes and shoe sizes are examples of “variety-reducing standards”. In order to encourage economies of scale and the resulting lower costs to producers and consumers, goods that may naturally have had a continuum of allowed properties are discretized into a smaller number of varieties that will be good-enough for most purposes.
For example, my feet may naturally fit best in size 9.3572 shoes. But I do not see that size on the shelves. I see only shoes in half-size increments. Certainly I could order custom-made shoes to fit my feet exactly, but this would be rather expensive. So, accepting that the manufacturing, distribution and retail aspects of the footwear industry cannot stock 1,000's of different shoe sizes and still sell at a price that I can afford, I buy the most comfortable standard size, usually men's size 9.5.
And yes, Virginia, there is also an ISO Standard for shoe sizes, called ISO 9407:1991 “Mondopoint”.
Decreased Information Asymmetry
A key premise of an efficient & free market is the existence of voluntary sellers and voluntary buyers motivated by self-interest in the presence of perfect information. But the real marketplace often does not work that way. In many cases there is an asymmetry of information which hurts the consumer, as well as the seller.
For example, when you buy a box of breakfast cereal at the supermarket, what do you know about it? You cannot open the box and sample it. You cannot remove a portion of the cereal, bring it to a lab and test it for the presence of nuts or measure the amount of fiber contained in it. The box is sealed and the contents invisible. All you can do is hold and shake the box.
The disadvantage to the consumer from this information asymmetry is obvious. But the manufacturer suffers as well. This stems from the difficulty of charging a premium for special-grade products if this higher grade cannot be verified by the consumer prior to purchase. How can you sell low-fat or high-fiber or all-natural or low-carb foods and charge more for those benefits, if anyone can slap that label on their box?
The government-mandated food ingredient and nutritional labels solves the problem. The supermarket is full of standards like this, from standardized grades of eggs, meat, produce, olive oil, wine, etc. There are voluntary standards as well, like organic food labeling standards, that fulfill a similar purpose.
Compatibility
Compatibility standards, also called interface standards, provide a common technical specification which can be shared by multiple producers to achieve interoperability. In some cases, these standards are mandated by the government. For example, if you want to ship a letter using First Class postage, you must adhere to certain size and shape restrictions on the letter. If you want to to send many letters at once, using the reduced bulk rate, then you must follow additional constraints on how the letters are addressed and sorted. If you want to deal with the Post Office, then these are the standards you must follow.
Similarly, if you are a software developer and you want to write an application that does electronic tax submissions, then you most follow the data definitions and protocols defined by the IRS.
Required interface standards are quite common when dealing with the government. Regulations requiring the use of specific standards also promote public safety, health and environmental protection.
And not just government. A sufficiently dominant company in an industry, a WalMart, an Amazon or an eBay, can often define and mandate the use of specific standards by their suppliers. If you want to do business with WalMart, then you must play by their rules.
Network Goods
Where it gets interesting is when compatibility standards combine with the network effect. I'm sure many of you are familiar with the network effect, but bear with me as I review.
The first person to have a telephone received little immediate value from it. All Mr. Bell could do was call Mr. Watson and tell him to come over. But the value of the telephone grew as each new subscriber was connected to the network, since there were now more people who could be contacted. Each new user brought value to all users, present and future. When the value of a technology increases when more people use it, then you have a network effect.
In a classic, maximally-connected network, like the telephone system, when you double the number of subscribers, you double the value to each user. This also causes the value of the entire network — the total value to all subscribers — to square. So double the number of participants in the network, and the value of the network goes up four-fold.
Of course, this only works up to a point. There are diminishing returns. When the last rural villager in Albania gets a telephone connection, I personally will not notice any incremental benefit. But when we're talking about the initial growth period of the technology, then the above rule is roughly the behavior we see.
Other familiar network effect technologies include the Internet's technical infrastructure (TCP/IP, DNS, etc.), eBay, Second Life, social networking sites such as Flickr, del.icio.us or Digg, etc.
If we delve deeper we can talk about two types of network effects: direct and indirect. The direct effect, as described above, is the increased value you receive in using the system as greater numbers of other people also use the system. The indirect effects are the supply-side effects, caused by things like increased choice in vendors, increased choice in after-market options and repairs, increased cost efficiencies and economies of scale by a market that can optimize production around a single standard.
So take the example of eBay. The direct network effect is clear. The more people that use it, the more buyers and sellers are present, and the more value there is to all of the buyers and sellers. The indirect network effect is the number of 3rd party tools for listing auctions, processing sales, watching for wanted items, sniping, etc., which are available because of the concentrated attention on this one online auction site.
It might be helpful to look at this graphically. The following chart attempts to show two things:
- How the average per-user cost of using the technology C(N) decreases as more people join the network.
- How the average per-user utility (value) U(N) increases as more people join the network.

A few things to note:
First, utility does not increase without limit and cost does not decrease without limit. There will be diminishing returns to both. Remember that last villager in Albania.
Also, note that initially the average cost is more than the average utility. But this is only the average. Not everyone's utility function is the same. If they were, then network would never get started. Fortunately, there is a diversity of utility functions. Some users will see more initial value than others, and they will be the early adopters. Some will see far less value than others and they will be the late adopters.
Finally note the point marked as the “tipping point”. This is where the largest growth occurs, when the average user's utility is greater than the average users' cost.
Network Effect Compatibility Standards
So what does this all have to do with standards? My observation is that a single standard in a domain naturally results when there are strong direct and indirect network effects. And where these network effects do not exist, or are weak, then multiple standards flourish.
This can be seen as societal value maximization. A network of N-participants has a total value proportionate to N-squared. Split this into two equally-sized incompatible networks and the value is 2*(N/2)^2 or (N^2)/2. The maximal value comes only with a single network governed by a single standard.
Allowing two different networks to interoperate may be technically possible via bridging, adapting or converting, but this at best preserves the direct network effects only. The indirect effects, the economies of scale, the choice of multiple vendors, the 3rd party after-market options, etc., these reach their maximum value with a single network. The indirect network benefits essentially follow from the industry concentrating their attention and effort around a single standard. When split into multiple networks, the industry instead concentrates their attention on adapters, bridges and convertors, which requires effort and expense on their part, with the cost eventually passed on to the consumer, although it brings the consumer no net benefit over having a single network.
The Cases from Part I
Let's finish by reviewing the cases presented in Part I, in light of the above analysis, to see if those examples make more sense now.
- Railroad gauge — This is clearly a network compatibility standard, with strong direct and indirect effects. When everyone uses the same gauge, travelers and goods can travel to more places, faster and at less cost. The indirect effect is that it allows the train manufacturer to concentrate on producing a train that fits a single gauge. As this happens the train companies have a greater choice of whom they can buy from. Everyone wins.
- Standard Time — This is more subtle, but it is also a network effect standard. The more people who use Standard Time, the easier it was to communicate times unambiguously and without error to others who were also using Standard Time. There is also an aspect of variety-reduction to this, where having fewer local times to worry about simplified the train time tables which made it easier for passengers and shippers or interacted with the trains.
- The single language for civil aeronautics. This is variety-reduction, a mandated safety standard, as well as a networked compatibility standard, where the network consists of pilots and control towers.
- Beverage can diameters — This is a variety-reducing standard. There is no network effect. Ask yourself, when you buy a can of Coke, does it bring more value to others who have also bought a can of Coke? No, it doesn't.
- TV signals — Clearly this is a network compatibility standard, with strong direct and indirect effects. The network is not just of the viewers of TV. It also includes the networks, the local affiliates, and the companies that manufacture the hardware and software, from antennas and transmitters, to camera, editing software, televisions and VCR's.
- The complexity of the above network is one reason why the government has stepped in to mandate the switch to digital television. (The other reason is the money they will get from auctioning off the radio spectrum this conversion will free up) The free market is good at many things, but the complex conversion of an entire network of diverse and competing producers and consumers at many levels is not something it has the agility to accomplish.
- Fire hose couplings — This started as a compatibility standard, but only at a local level. Baltimore had its own standard for its own fire company. However, as the railroad made it practical to transport fire companies from more distant cities, a larger network developed. By using the national standard hose coupling, you not only can now receive mutual assistance from other fire companies (direct value) you also have a greater choice of whom you can buy fire hoses from (indirect value), and fire hose manufacturers now have a larger market they can sell into (indirect value) and the concentration on a single coupling design (variety-reduction) will lead to manufacturing efficiencies and economies of scale (indirect value), as well as concentrated innovation around that standard (indirect value).
- Safety razors — There is no network effect with razors and razor blades. The value I get from using Gillette does not vary depending on how many other people use Gillette. I would get the same shave if I were the only one using it, as if the entire world used it.
- Video game consoles — These generally have been free of direct network effects, though there are clearly some indirect ones, in terms of varieties of titles, after-market accessories, etc. The interesting thing to watch will be to see whether the latest generation of game systems, the ones that allow play over the Internet, will lead to direct network benefits. Will this lead to standards in this area?
- SLR lens mounts, DVD disc standards, coffee filters, vacuum cleaner bags, etc. — These are all similar, compatibility standards with no direct network effects.
In Part III I'll look at the history of document formats, and see what factors have influenced their standardization. Some questions to think about until then:
- Some technologies, like rail gauges, local time or fire hose couplings went many years without standardization. Then, in a brief surge of activity, they were standardized. Look at the trends or events that participated the need for standardization. Is there any unifying logic to why these changes occurred? Hint, there is something here more general than just the trains.
- In the cellular phone industry, Europe and Asia made an early decision to standardize on the GSM network, while the U.S. market fragmented between CDMA, GSM and, earlier, D-AMPS. What effects does this have on the American versus the European consumer, direct and indirect?
- Microsoft has repeatedly stated that they are dead-against government mandates of specific standards. But they are a member of the HighTech Digital TV Coalition, an organization which is heavily lobbying the government to mandate Digital TV standards. How do we reconcile these two positions? Are they only against mandatory standards in areas where they have a monopoly?
- How does any of this relate to office document formats?
In Part III, we'll look at that last question in particular, including an illustrated review of the history of document formats.
3/23/07 — Corrections: Bell not Edison invented the telephone (Doh!). Also corrected calculation in value of two networks.
Labels: Standards
Tuesday, March 20, 2007
Cannibalism
The downside is clear. The minute you move to OOXML you have less choice with whom you can successfully exchange documents with. Office for the Mac, Windows Mobile, WordPerfect Office, Google Docs and Spreadsheets, SmartSuite, ThinkFree Office, users of these products, and the numerous 3rd party applications that can read and write the binary formats, these are now outside of the universe of people and applications that you can exchange documents with. Despite some early attempts from Sun and Novell, Linux users are left out as well.
So why move to OOXML? From the CTO's perspective, if your greatest concern is legacy compatibility, what is the ROI argument for changing file formats? Wouldn't the tendency be to remain where you are?
So the breakdown may happen like this:
- N% of companies put compatibility with legacy documents foremost. A% of these stay on Office/Windows and upgrade to Office 2007/OOXML. B% stay where they are and use the binary formats, and C% move to some combination of ODF and PDF.
- 100-N% make a decision primarily on factors other than 100% fidelity with legacy documents, such as ease of programmability, greater choice and diversity in applications and vendors, etc. X% stay on Office/Windows and upgrade to Office 2007/OOXML. Y% stay where they are and use the binary formats, and Z% move to some combination of ODF and PDF.
It is interesting to speculate on the initial percentages. But note that this is a network effect game, so the percentages will vary over time based on expectations.
Monday, March 19, 2007
Pruning Raspberries
The correct way to prune brambles depends on their variety, whether they are primocane-bearing, or floricane-bearing. Many raspberries, and all blackberries, are floricane-bearing, meaning they have a two-year cycle, where the canes that grow this year (the primocanes) will flower and bear fruit next year (when they will be called floricanes). The primocane-bearing varieties, on the other hand, bear fruit on this year's canes. I like having a mix, since that spreads out the harvest.
The floricane-bearing varieties, since they started their growth last year, will bear fruit in the summer, while the primocane-bearing varieties, which need to complete their growth in a single year, will bear fruit later, in the fall. Primocane-bearing varieties are cut to the ground after harvest. The maintenance of floricane-bearing varieties is a little more complicated. The floricanes are removed after harvest, and the primocanes, which will be next year's floricanes, are pruned and thinned while the plant in dormancy, late winter, which is the work I was able to complete before this last snowstorm.
Pruning of brambles will consider several factors:
- The architecture of the plant. A bush full of large berries will have considerable weight. One option is to trellis the plants to support that weight. Another option, which I prefer, is to maintain the canes and side branches at a length where the plant can be self-supporting, 4-5 feet tall, side branches trimmed to 8-12 inches.
- Cane density. It is better to have 3-5 thick, strong canes per linear foot than to have 15 smaller ones. The goal in the end is to have a bounty of fruit, not foliage. So now is the time to thin the canes.
- Access for sun, rain, air and me. This is another reason to thin the canes. A big dense mass of canes competing for limited resources will produce poorly, be susceptible to mold, and will be difficult to harvest.
To that I respond, that nature, in her infinite wisdom, does not seem to care much for bringing me berries. I am not absolutely certain what role brambles play in the grand scheme of things, but if I had to guess, nature likes them to form wild, uncontrolled, dense masses of thorny canes, with berries inaccessible to larger mammals. That seems to be their natural tendency in my garden. However, in their natural state, the brambles thicket forms an ideal protective habitat for small birds, who can remain protected from predators while eating the berries. The berry seeds survive unharmed by the digestive system of the birds and are excreted, with fertilizer, in distant locations, leading to the better propagation of the species.
And so I battle the genes inside the berries, pitting my labors against nature's disordered fecundity. It breaks the back and scrapes the skin, but it must be done again each year, around this time.
ODF Freely Available

Labels: ODF
Sunday, March 18, 2007
The Case for a Single Document Format: Part I
In Part I we'll take a survey of a number of different problem domains, some that resulted in a single standard, some that resulted in multiple standards.
In Part II we'll try to explain the forces that tend to unify or divide standards and hopefully make sense of what we saw in Part I.
In Part III we'll look at the document formats in particular, how we got to the present point, and how and why historically there has always been but a single document format.
In Part IV, if needed, we'll tie it all together and show why there should be, and will be, only a single open digital document format.
Let's get started!
Standards — in some domains there is a single standard, while in other domains there are multiple standards. What is the logic of this? What domains encourage, or even demand a single standard? And where do multiple standards coexist without problems?
Let's take a look at some familiar examples and see if we can figure out how this works. We'll start with some examples where a single standard dominates.
Single Standards
The story of the standard rail gauge is probably familiar to you. At first each rail company laid down their own tracks to their own specifications. In the United States there were different gauges used in the North (5′ 9″) and the South (5′). This was not a major issue so long as rail travel remained local or regional. However, as the reach of commerce increased, the pain of dealing with the "break of gauge" between adjacent gauge systems increased. Passengers and goods needed to be offloaded and transferred to a different train, causing time delays and inefficient utilization of equipment. The decision was made to adopt a Standard Gauge of 5′ 9″ and an ambitious migration project took place on May 31st, 1886, when thousands of workers in the South adjusted the west track and moved it 3″ to the east, lining up with the Northern gauge. Eleven-thousand miles of tracks were converted in thirty-six hours.
It should be noted that this unification was not universally celebrated. In particular, riots occurred at some of the junction points, like Erie, Pennsylvania, where local workers stood to lose the high-paying jobs they had unloading and loading cargo onto new trains. Efficiency is often opposed by those who profited from inefficiency.
Another standard prompted by the railroad was the adoption of standard time. In earlier days each town and city had its own local time, roughly based on solar mean time. When it was noon in Chicago, it was 12:09 in Cincinnati, and 11:50 in St. Louis. The instant of local noon would be communicated to residents by a cannon shot or by dropping a ball from a tower, allowing all to synchronize their clocks. The ball drop could be observed by ships in the harbor by telescope and so was much more accurate than the cannon, since the signal was not delayed by the non-negligible travel time of sound. Some memory of this tradition continues to this day with the New Year's Eve ball drop in Times Square.
When it took days by coach to travel from Chicago to Cincinnati, it did not matter that your watch was 9-minutes slow. Your watch probably wasn't accurate enough to tell the difference in any case. When noon came in Cincinnati you would synchronize your watch, knowing that some of the correction was caused by the change in longitude, and some was caused by the imperfections in the watch. But the average person did not care because they did not travel all that much.
However, with the coming of the railroad and then the telegraph, everything changed. People, goods and information could be transferred at far greater speeds. The difference of 9 minutes was now significant.
Initially, each rail company defined its own time, based on the local time of its main office. Timetables would be printed up based on this time. So a large train station, which may serve six different lines, would display six different clocks, all set to different times, some 12 minutes ahead, some 15 minutes behind, etc. At one point, trains in Wisconsin were operating on 38 different times! This was not only an inconvenience to travelers, it was also increasingly a safety concern, since the use of different time systems at the same station increased the chance of collisions.
This was addressed by the adoption of Standard Time in the United Stated on November 18th, 1883, the so-called "Day of Two Noons" . This was the day that the Eastern, Central, Mountain, and Pacific time zones took effect, and on this day every town adjusted its local time to the Standard Time of their new time zone. If you were in the eastern-half of your time zone, then when local noon came you would set your clocks back a specified number of minutes, and would thus observe noon twice. If you were on the western-half of your time zone, you would advance your clocks at local noon a specified number of minutes. The contemporary coverage of this event in The New York Times is worth a read.
Over the years, the every increasing rate of commerce and information flow has lead to greater and greater precision in time-keeping, so that today with atomic clocks and UTC we can now account for the slowing of the Earth's rotation and the insertion of occasional leap seconds.
The International Civil Aviation Organization (ICAO) is a UN agency that maintains various aeronautical standards, such as airport codes, aircraft codes, etc. They are also responsible for making English the required language for air-to-ground communications. So when an Italian plane, with an Italian crew on an Italian domestic flight contacts the approach tower at an Italian airport, manned by Italian personnel, they will contact the tower in English. Why do you think this is so?
The diameter of beverage cans has but little variation. A can of Coca-Cola and a can of Pepsi will both fit in my car's cup holder. They also fit fine in the cup holders in my beach chair or rider lawnmower. This works with beer cans as well, with innovative holders such as the novelty beer hat . Vending machines seem to take advantage of this standard as well, since it simplifies their design. The whole beverage can ecosystem works because of standards around beverage can sizes. How is this standard maintained? Was it planned this way?
It is interesting to note that, from the beverage company's perspective this is non-optimal. A can has minimum surface area for a given volume when it has equal height and diameter. But we never see beverage cans of that shape. Why not?
In the United States, our television signals are encoded in the NTSC system. PAL is used in most of Western Europe and Asia, and SECAM is used in France and Eastern Europe. The United States is moving to a new standard, High Definition, HDTV, by February 17th, 2009. This is the law, as enacted by Congress, that we must move to a new television standard, causing expenses to broadcasters and consumers, as well as generating a lot of revenue for electronic manufacturers. Why did this require a law? If it was good for consumers and for manufacturers, wouldn't the free market make this move on its own?
The Great Baltimore Fire of 1904 quickly grew beyond the control of local fire companies. As the fire spread to encompass the entire central business district, the unprecedented call went out by telegraph for assistance from fire companies from Washington, DC and Annapolis and as far away as Philadelphia, Atlantic City and New York. But when these companies arrived, with their own equipment, they found that their hose couplings were incompatible. This was a large contributing factor to these fire's duration and destructive power. Over 1,500 buildings were destroyed over 30 hours. Within a year there was a national standard for fire hoses.
To these can be added the hundreds of standardized items that we work with every day, such as standardized electrical connectors, light bulbs, food nutritional labels, gasoline nozzles, network addresses, batteries, staples, toilet paper holders, telephones networks, remote control infrared signals, envelopes, paper sizes and weights, currency, plumbing fixtures, light switch face plates, radio frequencies and modulations, screws, nails and other fasteners, etc.
Multiple Standards
Now let's switch to some examples of domains where multiple standards have flourished.
The textbook example is the safety razor. When the safety razor was invented by Gillette, they were interchangeable, disposable blades made of carbon steel. As such they rusted and needed to be frequently replaced. Wilkinson Sword, later owner of the Schick brand, started making compatible stainless steel blades, which Gillette then copied. So there was a good amount of competition going on.
In the early 1970's Gillette moved to embed the blades into disposable cartridges which, due to their patent protection, could not be copied by other manufacturers. This lead to our present situation of having multiple, incompatible razor systems. Competition remains fierce, with a battle to see who can put the most blades in a cartridge, from the Gillette Trac II with two blades and the Mach 3 with three blades, to Schick's Quattro with 4 blades, to Gillette's Fusion with 5 blades. Any guesses on what is next?
Video game consoles are in a similar position. In fact, they are often called a "razor and razor blade" business, since they sell the consoles at less than cost and later make their profit selling the game cartridges in proprietary formats. There is little interest, and seemingly little demand for a universal game cartridge standard.
Another example is the realm of SLR camera lens mounts. Each camera manufacturer has their own system of incompatible lens mounts. Is one clearly better than another? Have the multiple standards encouraged innovation in the area of lens mounts over the past 40 years? Good question. All I know is I have a bag full of Minolta lenses that I can't use anymore since I moved to a Pentax camera.
We've all seen the many optical storage formats in recent years. Just in the realm of writable DVD disk standards, we've seen DVD-R, DVD-RW, DVD+RW and DVD-RAM, many of them in single and double-sided variations.
In the past 5 years we've seen perhaps a dozen or more varieties and variations of memory card formats, all of them proprietary and incompatible with each other. It makes the state of optical disk formats seem regular and peaceful in comparison.
To these can be added the hundreds of daily items that have managed to avoid a single standard, such as vacuum cleaner bags, coffee filters, laptop power supplies, cell phone chargers, high definition video disc formats, surround sound audio disc formats, etc.
That is all for Part I. Some questions to ask yourself:
- In the examples given of domains where there is a single standard, most of them did not start off that way. Most started with many competing approaches. What forces led them to a single standard?
- Who won and who lost in moving to a single standard? Who decided to make the move?
- In the cases where there are multiple, incompatible standards, is there a market demand for unified standards? Why or why not?
- If a government decree came down today and mandated a single standard in those areas, what would be gained? What would be lost?
Labels: Standards
Tuesday, March 13, 2007
Fast Track. Wrong Direction.
CLI itself had earlier been standardized in Ecma (approved in 2000) and Fast Tracked through ISO (approved in 2001). So, it was not much of a surprise when the C++ variant for Microsoft's .NET Framework, C++/CLI, was proposed for standardization as well. Ecma TC39/TG5 started work on C++/CLI in December 2003 and Ecma approved the specification as Ecma-372 in December 2005. Two years in committee, resulting in a 304-page specification. This used to be considered a fast pace.
After approval by Ecma, C++/CLI was submitted for Fast Track processing to ISO/IEC JTC1/SC22 as DIS 26926. Like any other Fast Track in JTC1, this process started with a 30-day contradiction period. Contradiction submissions were made by both Germany[pdf] and the UK[pdf].
The UK's position was that calling the standard "C++/CLI" would cause, and in fact was already causing, confusion among users with the already existing C++ programming language. The name of the standard was unacceptable:
We consider that C++/CLI is a new language with idioms and usage distinct from C++. Confusion between C++ and C++/CLI is already occurring and is damaging to both vendors and consumers.
A new language needs a new name. We therefore request that Ecma withdraw this document from fast-track voting and if they must re-submit it, do so under a name which will not conflict with Standard C++.
Similar views were expressed by Germany:
With reference to §13.4 of the JTC1 Directives, 4th edition, DIN brings to the attention of the JTC1 secretariat that we perceive a contradiction between document JTC 1 N 8037 "30 Day Review for Fast Track Ballot ECMA-372 1st edition C++/CLI Language Specification"and the JTC1/C++ standard ISO/IEC 14882:2004 "Programming language C++" and related technical reports.
We propose that the document is input into SC22 as a regular New Work Item Proposal and assigned to WG21 for further processing.
Ecma responded[pdf] to these objections in a 5-page letter, on 29 January 2006, that refused to make even the most basic concession, such as changing the name to remove the C++ reference.
So the objections are ignored, and they move on to the 5-month ballot period, starting March 9th, 2006. When the ballot closed in August, and the votes were counted, C++/CLI had received 11 out of 20 P-Member votes (55%) and a total of 9 negative votes out of 26 total votes cast, or 34.61%. So it failed both to get the required 2/3 approval of P-Members, as well as to keep the negative votes to less than 25%.
Germany and the UK voted disapproval. No surprise there, since they had objected early in the process, and their objections were ignored. In fact one of Germany's comments in the ballot was:
DIN has commented before, as well as BSI did, that allowing fast-track standardization of the "C++/CLI Language" under this name clearly conflicts with an existing and actively maintained standard: ISO 14882 - the C++ Programming Language. The document under review spells out under "NOTE FROM ITTF", bullet 2.2, that ITTF will ascertain that this proposed standard does not conflict with any other International Standard but such a conflict was pointed out. No reason has been given why this objection was overridden. Thus, DIN wants to express its surprise that standardization of this proposal went forward.
The US comments included:
The proposed standard is not market driven, nor is it the product of an industry consensus.
We are unimpressed with the very low level of C++ community participation mustered in the design and refinement of the current document, and feel, quite frankly, that the current state of this document is not at a high enough level of technical excellence to merit the ISO imprimatur.
France said:
This document should be withdrawn from the fasttrack approval process pending re-drafting and a more adequate review prior to voting. Better yet, retain it as an Ecma standard only until a clear market consensus develops that a JTC1 standard in this area is needed.
And so on, down the list.
It should be noted that a failing vote in the 5-month ballot is not necessarily fatal. The Fast Track submitter, in this case Ecma, can call on the SC Secretariat to convene a Ballot Resolution Meeting (BRM), where the issues can be discussed and resolved, possibly leading to a positive vote after a further ballot. This is Ecma's right as a Fast Track submitter. However, C++/CLI did not see a ballot resolution meeting. The JTC1 Secretariat recently notified SC22 members:
We have been advised that the comments accompanying the Fast Track ballot for DIS 26926 are not resolvable and that holding a Ballot Resolution Meeting (BRM) would not be productive or result in a document that would be acceptable to the JTC 1 National Bodies. Therefore, our proposal is to not hold the BRM and to cancel the project.
So, the BRM which had been scheduled for April, 2007 has been canceled, and that's where it stands today, with the attempted Fast Track of C++/CLI dead from seemingly easily preventable flaws.
Lessons, anyone?
Don't ignore NB members. If they take the time and make the effort to point out your flaws early in the process, then you should count yourself lucky. This is like the school teacher walking around the classroom during a quiz and pointing to one of your answers and saying, "You might want to take another look at that problem". If you ignore her advice and just turn in your paper, then you deserve the grade you get.
It is instructive as well that although only two NB's objected in the C++/CLI contradiction period, this grew to a far larger number by the time the 5-month ballot had ended. Ignoring problems doesn't make them go away.
One last thing. Any guesses on how long those contradiction arguments stay online before they are taken down to preserve the shrouded secrecy of ISO process? I advise you to make a copy now. I certainly have.
Labels: OOXML
Tuesday, March 06, 2007
Document Migrations
In 1995 I was working at Lotus on Freelance Graphics, along with many others, getting SmartSuite ready for Windows 95. One day, as I walked to work and rounded the corner of Binney Street, I saw something unusual, even more unusual than the usual unusual one sees in Cambridge. Something was up. There were news vans parked in front of LDB, camera crews and reporters looking for comments, Lotus security videotaping the reporters asking for comments, and me standing there, clueless.
This was how I first heard of IBM's take-over offer. It was hard to concentrate on porting to Windows 95 with all that news going on downstairs, but we managed.
In the weeks and months that followed there were many changes. At Lotus we were 100% SmartSuite users. No surprise there. Most of us did not even have a copy of Microsoft Office on our machines, unless we worked on file compatibility. Not only did we use SmartSuite for our collaborative work, creating and reviewing specifications, giving presentations, etc., we also ran some of our business processes on it. In particular we used an expense report application, done in 1-2-3 with LotusScript.
But IBM used Microsoft Office. So when IBM took over, we needed to migrate. Sure, there was whining and moaning and gnashing of teeth on our end about having to move to an inferior product. And it did take a little while to get accustomed to the different conventions of Office, typing AVERAGE() in Excel, rather than @AVG() in 1-2-3 and stuff like that. But we did it. We moved to Office. It was clear to all that the benefits of having a single file format outweighed the short-term pain on migration.
It is interesting what we did not do:
- We did not go and convert all existing legacy SmartSuite documents into Office format. What would have been the point? Most old documents are never touched again. Let them rest in peace.
- We did not delete SmartSuite from our hard drives. We kept the application there for cases where we needed to access old documents.
- We did not simply continue using SmartSuite and tell it to save in Office format. We knew that both fidelity-wise and performance-wise it is far better to use an application that supports a format natively than to rely on conversion software for interoperability.
- We did not translate 1-2-3 macro-based applications into Excel macro-based applications. We took the opportunity to move straight to web based applications. Aside from some standard presentation templates and similar boiler-plate templates we did not do a lot of conversion work.
I'm not much of one for committing unprovoked acts of methodology, but if I had to summarize what little wisdom I have in this area, I'd say that for a migration you want evaluate your existing documents by three criteria: stability, complexity and business criticality, and develop a migration plan based on that.
In the first case you classify documents by how stable (unchanging) they are:
- Hot documents — the documents that are being heavily changed and edited today, works-in-progress, in active collaborations
- Cold documents — the documents which are no longer edited, though perhaps they are still read. Many of these documents may have zero value and are just taking up space. Others may be valuable records, but hidden away on someone's hard-drive.
- Warm documents — These are the ones that are in the middle, not seeing heavy activity, but they aren't quite frozen either.
From the perspective of complexity we have:
- Low complexity — simple text and graphics
- Medium complexity — using more advanced features, created by power users
- High complexity — "engineered documents", using scripting and macros to create applications.
- Internal use documents — internal presentations and reports
- Customer facing documents — engagement reports, proposals, etc.
- Publication ready documents — white papers, journal articles, etc.
So you are transitioning from Office legacy binary formats to ODF. What do you do with each of these document classes? You have four main strategies to consider:
- Do nothing and preserve the document in the legacy format, maintaining, as needed, access to the legacy application.
- Convert document to a portable high fidelity static representation, like PDF
- Convert directly to ODF.
- Reengineer as something other than a document.
So one migration policy might look like this:
| Stability | Complexity | Exposure | Strategy |
|---|---|---|---|
| Cold | Low | Internal Use | Do nothing |
| Cold | Low | Customer Facing | Do nothing |
| Cold | Low | Publication Ready | Do nothing |
| Cold | Medium | Internal Use | Do nothing |
| Cold | Medium | Customer Facing | Do nothing |
| Cold | Medium | Publication Ready | Do nothing |
| Cold | High | Internal Use | Do nothing |
| Cold | High | Customer Facing | Convert to PDF |
| Cold | High | Publication Ready | Convert to PDF |
| Warm | Low | Internal Use | Convert to ODF |
| Warm | Low | Customer Facing | Convert to ODF |
| Warm | Low | Publication Ready | Convert to ODF |
| Warm | Medium | Internal Use | Convert to ODF |
| Warm | Medium | Customer Facing | Convert to ODF |
| Warm | Medium | Publication Ready | Convert to ODF |
| Warm | High | Internal Use | Convert to ODF |
| Warm | High | Customer Facing | Publish as PDF |
| Warm | High | Publication Ready | Publish as PDF |
| Hot | Low | Internal Use | Convert to ODF |
| Hot | Low | Customer Facing | Convert to ODF |
| Hot | Low | Publication Ready | Convert to ODF |
| Hot | Medium | Internal Use | Convert to ODF |
| Hot | Medium | Customer Facing | Convert to ODF |
| Hot | Medium | Publication Ready | Convert to ODF |
| Hot | High | Internal Use | Reengineer |
| Hot | High | Customer Facing | Reengineer |
| Hot | High | Publication Ready | Reengineer |
There may be a better way of expressing this above (Karnaugh maps anyone?) but that gives the idea. Also, I'm not suggested that this is the "one true answer", but merely that this may be a useful way of framing the problem.
Variations might include:
- Have a default policy of doing no conversions, but create all new documents in ODF format.
- By default, ignore all legacy documents. But the first time any legacy document is read or written, put it into a queue for evaluation and possible conversion.
Much of this lends itself to automation. For example:
- First you need to find all of the documents in an organization. This could be done by an activeX control on a page everyone in the company visits, an agent that spiders the intranet web pages and file servers, etc.
- Each document is then scored.
- Finding the stability of a document could be done by looking at the last read and last write stamps on the file. Also can look weblogs. Maybe even metadata in the document that tells how many times it has been edited.
- Complexity could be determined by scanning the document to see what features it uses. Some features, like script, would weight heavily for complexity. Think of it as a "goodness of fit" metric for how well the features used in the document fit within the ODF model.
- Business criticality is harder to automate, but could be done based on owner of the document, metadata in the document, location of the document (public web page versus intranet), etc.
- Calculate the scores, suggest actions to take, and then automate the action. This could lead to a nice automated migration solution.
In summary, it probably is not worth while simply to go out and convert all of your legacy documents in a giant cathartic orgy of document transformations. Not all documents are worth that effort. In any organization you probably have many many documents that will never be read again, ever. You also likely have some very complex documents that probably should be reengineered as web applications on your intranet. The other documents, the ones in the middle, that is where you focus your migration effort.
Labels: ODF
Sunday, March 04, 2007
Compatibility According to Humpty Dumpty
‘I don't know what you mean by “glory,” ’ Alice said.
Humpty Dumpty smiled contemptuously. ‘Of course you don't — till I tell you. I meant “there's a nice knock-down argument for you!” ’
‘But “glory” doesn't mean “a nice knock-down argument,” ’ Alice objected.
‘When I use a word,’ Humpty Dumpty said, in a rather scornful tone, ‘it means just what I choose it to mean, neither more nor less.’
‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’
‘The question is,’ said Humpty Dumpty, ‘which is to be master - that's all.’
— Lewis Carroll from Through the Looking-Glass (1871)
I have written about Microsoft's language games previously. These games continue and it appears to be time for yet another inoculation. Words such as “open”, “choice”, “interoperability”, “standard”, “innovation” and “freedom” have been bandied about like patriotic slogans, but with meanings that are often distorted from their normal uses.
The aggrieved word I want to examine today is “compatibility”. Let's see how it is being used, with some illustrative examples, the ipsissima verba, Microsoft's own words:
From an open letter “Interoperability, Choice and Open XML” by Jean Paoli and Tom Robertson:
The specification enables implementation of the standard on multiple operating systems and in heterogeneous environments, and it provides backward compatibility with billions of existing documents.
From another open letter, Chris Capossela's “A Foundation for the New World of Documents”:
... all the features and functions of Office can be represented in XML and all your older Office documents can be moved from their binary formats into XML with 100 percent compatibility. We see our investment in XML support as the best way for us to meet customers’ interoperability needs while at the same time being compatible with the billions of documents that customers create every year.
From Doug Mahugh: “The new Open XML file formats offer true compatibility with all of the billions of Office documents that already exist.”
And from Craig Kitterman: “Is backward compatibility for documents important to you? How about choice?”
Those are just a handful of examples. Feel free to leave a comment suggesting additional ones.
Compatibility. Better yet, True Compatibility. What is that? And what do you think the average user, or even the average CTO, thinks, when hearing these claims from Microsoft about 100% compatibility?
Let's explore some scenarios and try to reverse-engineer Microsoft's meaning of “True compatibility”.
Suppose you get a new, more powerful PC with more memory and upgraded graphics card and you upgrade to Vista and Office 2007. You create a new presentation in PowerPoint 2007 and save it in the new OOXML format. What can you do with it?
Can you exchange it with someone using Office on the Mac? Sorry, no. OOXML is not supported there. They will not be able to read your document.
Is this 100% compatibility?
What about Windows Mobile? Can I read my document there? Sorry, OOXML is not supported there either.
Is this 100% compatibility?
What about sending the file to your friends using SmartSuite, WordPerfect Office or OpenOffice, or KOffice? They all are able to read the legacy Microsoft formats, so surely a new format that is 100% compatible with the legacy formats should work here as well? Sorry, you are out of luck. None of these applications can read your OOXML presentation.
Is this 100% compatibility?
What about legacy versions of Microsoft Office? Can I simply send my OOXML file to a person using an old version of Office and have it load automatically? Sorry, older versions of Office do not understand OOXML. They must either upgrade to Office 2007 or download and install a convertor.
Is this 100% compatibility?
I have Microsoft Access XP and an application built on it that imports data ranges from Excel files and imports them into data tables. Will it work with OOXML spreadsheets? Sorry, it will not. You need to upgrade to Access 2007 for this support.
Is this 100% compatibility?
What about other 3rd party applications that take Office files as input: statistical analysis, spreadsheet compilers, search engines, document viewers, etc. Will they work with OOXML files? No, until they update their software your OOXML documents will not work with software that expects the legacy binary formats.
Is this 100% compatibility?
Suppose I, as a software developer, takes the 6,039 page OOXML specification and write an application that can read and display OOXML perfectly. It will be hard work, but imagine I do it. Will I then be able to read the billions of legacy Office documents? Sorry, the answer is no. The ability to read and write OOXML does not give you the ability to read and write the legacy formats.
Is this 100% compatibility?
So, there it is. A don't know if we're any closer to finding out what “100% compatibility” means to Microsoft. But we certainly have found lot of things it doesn't mean.
A quick analogy. Suppose I designed a new DVD format, and standardized it and said it was 100% compatible with the existing DVD standard. What would consumers think this means? Would they think that the DVD's in the new format could play in legacy DVD players? Yes, I believe that would be the expectation based on the normal meaning of “100% compatible”.
But what if I created a new DVD Player and said it supported a new DVD format, but also that the Player was 100% compatible with the legacy format. What would consumers think then? Would they expect that the new DVD's would play in older players? No, that is not implied. Would they expect that older DVD's could be played in the new Player? Yes, that is implied.
This is the essence of Microsoft's language game. The are confusing the format with the application. This is easy to do when your format is just a DNA sequence of your application. However, although Microsoft Office 2007, the application, may be able to read both OOXML and the legacy formats, the OOXML format itself is not compatible with any legacy application. None. The only way to get something to work with OOXML to write new code for it.
This is not what people expect when they hear these claims of OOXML being 100% compatible with legacy formats.
Labels: OOXML
Thursday, March 01, 2007
OASIS Symposium and OpenDocument Workshop
Bob Sutor will give the opening keynote. Scott Hudson will give a talk on, "DocTape: A Document Standards Interoperability Framework for DocBook, DITA, ODF and more!". I'll be joining a panel on Tuesday looking at ODF Interoperability and related topics. And Wednesday will be a half-day Workshop on ODF, with presentations on adoption, programmability, accessibility, interoperability and future directions.
Then back home on Thursday, my birthday. This gives my wife the rare opportunity to get a large present into the house without me noticing. Hint, hint...
Labels: ODF