Wednesday, March 28, 2007

The ODF Validation Service

No, this has nothing to do with getting discounted parking if you use ODF, though that is an intriguing idea...

Daniel Carrera (OpenDocument Fellowship and the OASIS ODF TC) has a new blog and with it comes news of a new ODF tool, an ODF Validator Service, written as part of the Fellowship's ODF Tools project by Alex Hudson.

It is in the spirit of the W3C's Markup Validation Service: upload a document and get an instant report of whether or not it is valid ODF, and if not, what problems were found. I tried a few documents and it seems to work well.

It would be interesting to see if something like this could be made into a flexible framework for scanning ODF documents, at various levels. Think of a SAX-like call-back parser but at multiple levels of detail. So the framework knows how to fully parse an ODF document and identify features at the Zip and XML level. Plugins to the framework can subscribe to various parse events. So, maybe a ZipListener interface that simply has methods onFile() and onDirectory(). Then a ManifestListener interface that allows you to subscribe to notifications of the data in the manifest. Then within a document, like a spreadsheet, you could have listeners at the structural and content level, so onWorksheet(), onCell(), or in a Wordprocessor document, onTable(), onImage(), etc.

A framework like this could allow you to make a range of applications that need to scan an ODF document and take some action on it.


The benefit of the framework is the reduction in code required to get directly to the info in the ODF document you want, without having to master the ODF specification or writing a lot of parsing code. Think of it as a framework for easy multi-level information extraction from ODF documents.

Change Log

4/11/2007 — Removed parenthetical comment about the need for a privacy policy, since one has now been added to the Validator page.

Labels:

Friday, March 23, 2007

The Case for a Single Document Format: Part II

This is Part II of a four-part post.

In Part I we surveyed of a number of different problem domains, some that resulted in a single standard, some that resulted in multiple standards.

In this post, Part II, we'll try to explain the forces that tend to unify or divide standards and hopefully make sense of what we saw in Part I.

In Part III we'll look at the document formats in particular, how we got to the present point, and how and why historically there has always been but a single document format.

In Part IV, if needed, we'll tie it all together and show why there should be, and will be, only a single open digital document format.

To make sense of the diversity of standardization behavior reviewed in Part I it is necessary to consider the range of benefits that standards bring. Although few standards bring all of these benefits, most will bring one or more.

Variety Reduction


Standards for screw sizes, wire gauges, paper sizes and shoe sizes are examples of “variety-reducing standards”. In order to encourage economies of scale and the resulting lower costs to producers and consumers, goods that may naturally have had a continuum of allowed properties are discretized into a smaller number of varieties that will be good-enough for most purposes.

For example, my feet may naturally fit best in size 9.3572 shoes. But I do not see that size on the shelves. I see only shoes in half-size increments. Certainly I could order custom-made shoes to fit my feet exactly, but this would be rather expensive. So, accepting that the manufacturing, distribution and retail aspects of the footwear industry cannot stock 1,000's of different shoe sizes and still sell at a price that I can afford, I buy the most comfortable standard size, usually men's size 9.5.

And yes, Virginia, there is also an ISO Standard for shoe sizes, called ISO 9407:1991 “Mondopoint”.

Decreased Information Asymmetry


A key premise of an efficient & free market is the existence of voluntary sellers and voluntary buyers motivated by self-interest in the presence of perfect information. But the real marketplace often does not work that way. In many cases there is an asymmetry of information which hurts the consumer, as well as the seller.

For example, when you buy a box of breakfast cereal at the supermarket, what do you know about it? You cannot open the box and sample it. You cannot remove a portion of the cereal, bring it to a lab and test it for the presence of nuts or measure the amount of fiber contained in it. The box is sealed and the contents invisible. All you can do is hold and shake the box.

The disadvantage to the consumer from this information asymmetry is obvious. But the manufacturer suffers as well. This stems from the difficulty of charging a premium for special-grade products if this higher grade cannot be verified by the consumer prior to purchase. How can you sell low-fat or high-fiber or all-natural or low-carb foods and charge more for those benefits, if anyone can slap that label on their box?

The government-mandated food ingredient and nutritional labels solves the problem. The supermarket is full of standards like this, from standardized grades of eggs, meat, produce, olive oil, wine, etc. There are voluntary standards as well, like organic food labeling standards, that fulfill a similar purpose.

Compatibility


Compatibility standards, also called interface standards, provide a common technical specification which can be shared by multiple producers to achieve interoperability. In some cases, these standards are mandated by the government. For example, if you want to ship a letter using First Class postage, you must adhere to certain size and shape restrictions on the letter. If you want to to send many letters at once, using the reduced bulk rate, then you must follow additional constraints on how the letters are addressed and sorted. If you want to deal with the Post Office, then these are the standards you must follow.

Similarly, if you are a software developer and you want to write an application that does electronic tax submissions, then you most follow the data definitions and protocols defined by the IRS.

Required interface standards are quite common when dealing with the government. Regulations requiring the use of specific standards also promote public safety, health and environmental protection.

And not just government. A sufficiently dominant company in an industry, a WalMart, an Amazon or an eBay, can often define and mandate the use of specific standards by their suppliers. If you want to do business with WalMart, then you must play by their rules.

Network Goods


Where it gets interesting is when compatibility standards combine with the network effect. I'm sure many of you are familiar with the network effect, but bear with me as I review.

The first person to have a telephone received little immediate value from it. All Mr. Bell could do was call Mr. Watson and tell him to come over. But the value of the telephone grew as each new subscriber was connected to the network, since there were now more people who could be contacted. Each new user brought value to all users, present and future. When the value of a technology increases when more people use it, then you have a network effect.

In a classic, maximally-connected network, like the telephone system, when you double the number of subscribers, you double the value to each user. This also causes the value of the entire network — the total value to all subscribers — to square. So double the number of participants in the network, and the value of the network goes up four-fold.

Of course, this only works up to a point. There are diminishing returns. When the last rural villager in Albania gets a telephone connection, I personally will not notice any incremental benefit. But when we're talking about the initial growth period of the technology, then the above rule is roughly the behavior we see.

Other familiar network effect technologies include the Internet's technical infrastructure (TCP/IP, DNS, etc.), eBay, Second Life, social networking sites such as Flickr, del.icio.us or Digg, etc.

If we delve deeper we can talk about two types of network effects: direct and indirect. The direct effect, as described above, is the increased value you receive in using the system as greater numbers of other people also use the system. The indirect effects are the supply-side effects, caused by things like increased choice in vendors, increased choice in after-market options and repairs, increased cost efficiencies and economies of scale by a market that can optimize production around a single standard.

So take the example of eBay. The direct network effect is clear. The more people that use it, the more buyers and sellers are present, and the more value there is to all of the buyers and sellers. The indirect network effect is the number of 3rd party tools for listing auctions, processing sales, watching for wanted items, sniping, etc., which are available because of the concentrated attention on this one online auction site.

It might be helpful to look at this graphically. The following chart attempts to show two things:





A few things to note:

First, utility does not increase without limit and cost does not decrease without limit. There will be diminishing returns to both. Remember that last villager in Albania.

Also, note that initially the average cost is more than the average utility. But this is only the average. Not everyone's utility function is the same. If they were, then network would never get started. Fortunately, there is a diversity of utility functions. Some users will see more initial value than others, and they will be the early adopters. Some will see far less value than others and they will be the late adopters.

Finally note the point marked as the “tipping point”. This is where the largest growth occurs, when the average user's utility is greater than the average users' cost.

Network Effect Compatibility Standards


So what does this all have to do with standards? My observation is that a single standard in a domain naturally results when there are strong direct and indirect network effects. And where these network effects do not exist, or are weak, then multiple standards flourish.

This can be seen as societal value maximization. A network of N-participants has a total value proportionate to N-squared. Split this into two equally-sized incompatible networks and the value is 2*(N/2)^2 or (N^2)/2. The maximal value comes only with a single network governed by a single standard.

Allowing two different networks to interoperate may be technically possible via bridging, adapting or converting, but this at best preserves the direct network effects only. The indirect effects, the economies of scale, the choice of multiple vendors, the 3rd party after-market options, etc., these reach their maximum value with a single network. The indirect network benefits essentially follow from the industry concentrating their attention and effort around a single standard. When split into multiple networks, the industry instead concentrates their attention on adapters, bridges and convertors, which requires effort and expense on their part, with the cost eventually passed on to the consumer, although it brings the consumer no net benefit over having a single network.

The Cases from Part I


Let's finish by reviewing the cases presented in Part I, in light of the above analysis, to see if those examples make more sense now.

Well, this is too long already, so I'll stop here.

In Part III I'll look at the history of document formats, and see what factors have influenced their standardization. Some questions to think about until then:
  1. Some technologies, like rail gauges, local time or fire hose couplings went many years without standardization. Then, in a brief surge of activity, they were standardized. Look at the trends or events that participated the need for standardization. Is there any unifying logic to why these changes occurred? Hint, there is something here more general than just the trains.
  2. In the cellular phone industry, Europe and Asia made an early decision to standardize on the GSM network, while the U.S. market fragmented between CDMA, GSM and, earlier, D-AMPS. What effects does this have on the American versus the European consumer, direct and indirect?
  3. Microsoft has repeatedly stated that they are dead-against government mandates of specific standards. But they are a member of the HighTech Digital TV Coalition, an organization which is heavily lobbying the government to mandate Digital TV standards. How do we reconcile these two positions? Are they only against mandatory standards in areas where they have a monopoly?
  4. How does any of this relate to office document formats?

In Part III, we'll look at that last question in particular, including an illustrated review of the history of document formats.

3/23/07 — Corrections: Bell not Edison invented the telephone (Doh!). Also corrected calculation in value of two networks.

Labels:

Tuesday, March 20, 2007

Cannibalism

A interesting post by Bob Sutor. What is OOXML's real competition, and how does that help ODF? The dynamics get interesting when you are hindered by your own install base. The main selling point of OOXML is its claimed 100% compatibility with the legacy binary formats. But if you are using Office 2000, and happy with it, what is the reason to move to OOXML? Why not remain using the binary formats? What justifies the migration?

The downside is clear. The minute you move to OOXML you have less choice with whom you can successfully exchange documents with. Office for the Mac, Windows Mobile, WordPerfect Office, Google Docs and Spreadsheets, SmartSuite, ThinkFree Office, users of these products, and the numerous 3rd party applications that can read and write the binary formats, these are now outside of the universe of people and applications that you can exchange documents with. Despite some early attempts from Sun and Novell, Linux users are left out as well.

So why move to OOXML? From the CTO's perspective, if your greatest concern is legacy compatibility, what is the ROI argument for changing file formats? Wouldn't the tendency be to remain where you are?

So the breakdown may happen like this:

I think that B & Z may be the dominating factors. N is large now because it includes the inertial effects of Microsoft's market dominance. Even companies that don't make an explicit choice will end up with that path by default. But even the most passive company will not fall into choice A without some thought.

It is interesting to speculate on the initial percentages. But note that this is a network effect game, so the percentages will vary over time based on expectations.

Labels: ,

Monday, March 19, 2007

Pruning Raspberries

The earth does not yield up her sweet fruits unrecompensed. For every berry I will harvest in September, I pay now an equal measure of sweat and blood. Hunched down and with thick gloves, I navigate the thicket of thorny bramble canes, the red raspberries, yellow raspberries, purple raspberries, blackberries, thimble berries and field berries, and restore man's order to nature's chaos.

The correct way to prune brambles depends on their variety, whether they are primocane-bearing, or floricane-bearing. Many raspberries, and all blackberries, are floricane-bearing, meaning they have a two-year cycle, where the canes that grow this year (the primocanes) will flower and bear fruit next year (when they will be called floricanes). The primocane-bearing varieties, on the other hand, bear fruit on this year's canes. I like having a mix, since that spreads out the harvest.

The floricane-bearing varieties, since they started their growth last year, will bear fruit in the summer, while the primocane-bearing varieties, which need to complete their growth in a single year, will bear fruit later, in the fall. Primocane-bearing varieties are cut to the ground after harvest. The maintenance of floricane-bearing varieties is a little more complicated. The floricanes are removed after harvest, and the primocanes, which will be next year's floricanes, are pruned and thinned while the plant in dormancy, late winter, which is the work I was able to complete before this last snowstorm.

Pruning of brambles will consider several factors:
  1. The architecture of the plant. A bush full of large berries will have considerable weight. One option is to trellis the plants to support that weight. Another option, which I prefer, is to maintain the canes and side branches at a length where the plant can be self-supporting, 4-5 feet tall, side branches trimmed to 8-12 inches.
  2. Cane density. It is better to have 3-5 thick, strong canes per linear foot than to have 15 smaller ones. The goal in the end is to have a bounty of fruit, not foliage. So now is the time to thin the canes.
  3. Access for sun, rain, air and me. This is another reason to thin the canes. A big dense mass of canes competing for limited resources will produce poorly, be susceptible to mold, and will be difficult to harvest.
The question can fairly be asked, "Why go through all this trouble? Why not let the invisible hand of nature guide the development of the brambles? Let her decide. She will pick the winners and losers."

To that I respond, that nature, in her infinite wisdom, does not seem to care much for bringing me berries. I am not absolutely certain what role brambles play in the grand scheme of things, but if I had to guess, nature likes them to form wild, uncontrolled, dense masses of thorny canes, with berries inaccessible to larger mammals. That seems to be their natural tendency in my garden. However, in their natural state, the brambles thicket forms an ideal protective habitat for small birds, who can remain protected from predators while eating the berries. The berry seeds survive unharmed by the digestive system of the birds and are excreted, with fertilizer, in distant locations, leading to the better propagation of the species.

And so I battle the genes inside the berries, pitting my labors against nature's disordered fecundity. It breaks the back and scrapes the skin, but it must be done again each year, around this time.

Labels: ,

ODF Freely Available

Another step forward for ODF. After gaining ISO approval in May, and Publication status in December, ISO/IEC 26300 is now counted among ISO's "Freely Available Standards". What is the significance of this? The text is identical to what it was in May, but you no longer need to pay 342 Swiss Francs to ISO to download an official copy. It is now free. Enjoy!

Labels:

Sunday, March 18, 2007

The Case for a Single Document Format: Part I

This will be a multi-part post, mixing in a little economics, a little history and a little technology — an intellectual smörgåsbord — attempting to make the argument that a single document format is the inevitable and desired outcome.

In Part I we'll take a survey of a number of different problem domains, some that resulted in a single standard, some that resulted in multiple standards.

In Part II we'll try to explain the forces that tend to unify or divide standards and hopefully make sense of what we saw in Part I.

In Part III we'll look at the document formats in particular, how we got to the present point, and how and why historically there has always been but a single document format.

In Part IV, if needed, we'll tie it all together and show why there should be, and will be, only a single open digital document format.

Let's get started!

Standards — in some domains there is a single standard, while in other domains there are multiple standards. What is the logic of this? What domains encourage, or even demand a single standard? And where do multiple standards coexist without problems?

Let's take a look at some familiar examples and see if we can figure out how this works. We'll start with some examples where a single standard dominates.

Single Standards


The story of the standard rail gauge is probably familiar to you. At first each rail company laid down their own tracks to their own specifications. In the United States there were different gauges used in the North (5′ 9″) and the South (5′). This was not a major issue so long as rail travel remained local or regional. However, as the reach of commerce increased, the pain of dealing with the "break of gauge" between adjacent gauge systems increased. Passengers and goods needed to be offloaded and transferred to a different train, causing time delays and inefficient utilization of equipment. The decision was made to adopt a Standard Gauge of 5′ 9″ and an ambitious migration project took place on May 31st, 1886, when thousands of workers in the South adjusted the west track and moved it 3″ to the east, lining up with the Northern gauge. Eleven-thousand miles of tracks were converted in thirty-six hours.

It should be noted that this unification was not universally celebrated. In particular, riots occurred at some of the junction points, like Erie, Pennsylvania, where local workers stood to lose the high-paying jobs they had unloading and loading cargo onto new trains. Efficiency is often opposed by those who profited from inefficiency.

Another standard prompted by the railroad was the adoption of standard time. In earlier days each town and city had its own local time, roughly based on solar mean time. When it was noon in Chicago, it was 12:09 in Cincinnati, and 11:50 in St. Louis. The instant of local noon would be communicated to residents by a cannon shot or by dropping a ball from a tower, allowing all to synchronize their clocks. The ball drop could be observed by ships in the harbor by telescope and so was much more accurate than the cannon, since the signal was not delayed by the non-negligible travel time of sound. Some memory of this tradition continues to this day with the New Year's Eve ball drop in Times Square.

When it took days by coach to travel from Chicago to Cincinnati, it did not matter that your watch was 9-minutes slow. Your watch probably wasn't accurate enough to tell the difference in any case. When noon came in Cincinnati you would synchronize your watch, knowing that some of the correction was caused by the change in longitude, and some was caused by the imperfections in the watch. But the average person did not care because they did not travel all that much.

However, with the coming of the railroad and then the telegraph, everything changed. People, goods and information could be transferred at far greater speeds. The difference of 9 minutes was now significant.

Initially, each rail company defined its own time, based on the local time of its main office. Timetables would be printed up based on this time. So a large train station, which may serve six different lines, would display six different clocks, all set to different times, some 12 minutes ahead, some 15 minutes behind, etc. At one point, trains in Wisconsin were operating on 38 different times! This was not only an inconvenience to travelers, it was also increasingly a safety concern, since the use of different time systems at the same station increased the chance of collisions.

This was addressed by the adoption of Standard Time in the United Stated on November 18th, 1883, the so-called "Day of Two Noons" . This was the day that the Eastern, Central, Mountain, and Pacific time zones took effect, and on this day every town adjusted its local time to the Standard Time of their new time zone. If you were in the eastern-half of your time zone, then when local noon came you would set your clocks back a specified number of minutes, and would thus observe noon twice. If you were on the western-half of your time zone, you would advance your clocks at local noon a specified number of minutes. The contemporary coverage of this event in The New York Times is worth a read.

Over the years, the every increasing rate of commerce and information flow has lead to greater and greater precision in time-keeping, so that today with atomic clocks and UTC we can now account for the slowing of the Earth's rotation and the insertion of occasional leap seconds.

The International Civil Aviation Organization (ICAO) is a UN agency that maintains various aeronautical standards, such as airport codes, aircraft codes, etc. They are also responsible for making English the required language for air-to-ground communications. So when an Italian plane, with an Italian crew on an Italian domestic flight contacts the approach tower at an Italian airport, manned by Italian personnel, they will contact the tower in English. Why do you think this is so?

The diameter of beverage cans has but little variation. A can of Coca-Cola and a can of Pepsi will both fit in my car's cup holder. They also fit fine in the cup holders in my beach chair or rider lawnmower. This works with beer cans as well, with innovative holders such as the novelty beer hat . Vending machines seem to take advantage of this standard as well, since it simplifies their design. The whole beverage can ecosystem works because of standards around beverage can sizes. How is this standard maintained? Was it planned this way?

It is interesting to note that, from the beverage company's perspective this is non-optimal. A can has minimum surface area for a given volume when it has equal height and diameter. But we never see beverage cans of that shape. Why not?

In the United States, our television signals are encoded in the NTSC system. PAL is used in most of Western Europe and Asia, and SECAM is used in France and Eastern Europe. The United States is moving to a new standard, High Definition, HDTV, by February 17th, 2009. This is the law, as enacted by Congress, that we must move to a new television standard, causing expenses to broadcasters and consumers, as well as generating a lot of revenue for electronic manufacturers. Why did this require a law? If it was good for consumers and for manufacturers, wouldn't the free market make this move on its own?

The Great Baltimore Fire of 1904 quickly grew beyond the control of local fire companies. As the fire spread to encompass the entire central business district, the unprecedented call went out by telegraph for assistance from fire companies from Washington, DC and Annapolis and as far away as Philadelphia, Atlantic City and New York. But when these companies arrived, with their own equipment, they found that their hose couplings were incompatible. This was a large contributing factor to these fire's duration and destructive power. Over 1,500 buildings were destroyed over 30 hours. Within a year there was a national standard for fire hoses.

To these can be added the hundreds of standardized items that we work with every day, such as standardized electrical connectors, light bulbs, food nutritional labels, gasoline nozzles, network addresses, batteries, staples, toilet paper holders, telephones networks, remote control infrared signals, envelopes, paper sizes and weights, currency, plumbing fixtures, light switch face plates, radio frequencies and modulations, screws, nails and other fasteners, etc.

Multiple Standards


Now let's switch to some examples of domains where multiple standards have flourished.

The textbook example is the safety razor. When the safety razor was invented by Gillette, they were interchangeable, disposable blades made of carbon steel. As such they rusted and needed to be frequently replaced. Wilkinson Sword, later owner of the Schick brand, started making compatible stainless steel blades, which Gillette then copied. So there was a good amount of competition going on.

In the early 1970's Gillette moved to embed the blades into disposable cartridges which, due to their patent protection, could not be copied by other manufacturers. This lead to our present situation of having multiple, incompatible razor systems. Competition remains fierce, with a battle to see who can put the most blades in a cartridge, from the Gillette Trac II with two blades and the Mach 3 with three blades, to Schick's Quattro with 4 blades, to Gillette's Fusion with 5 blades. Any guesses on what is next?

Video game consoles are in a similar position. In fact, they are often called a "razor and razor blade" business, since they sell the consoles at less than cost and later make their profit selling the game cartridges in proprietary formats. There is little interest, and seemingly little demand for a universal game cartridge standard.

Another example is the realm of SLR camera lens mounts. Each camera manufacturer has their own system of incompatible lens mounts. Is one clearly better than another? Have the multiple standards encouraged innovation in the area of lens mounts over the past 40 years? Good question. All I know is I have a bag full of Minolta lenses that I can't use anymore since I moved to a Pentax camera.

We've all seen the many optical storage formats in recent years. Just in the realm of writable DVD disk standards, we've seen DVD-R, DVD-RW, DVD+RW and DVD-RAM, many of them in single and double-sided variations.

In the past 5 years we've seen perhaps a dozen or more varieties and variations of memory card formats, all of them proprietary and incompatible with each other. It makes the state of optical disk formats seem regular and peaceful in comparison.

To these can be added the hundreds of daily items that have managed to avoid a single standard, such as vacuum cleaner bags, coffee filters, laptop power supplies, cell phone chargers, high definition video disc formats, surround sound audio disc formats, etc.

That is all for Part I. Some questions to ask yourself:
  1. In the examples given of domains where there is a single standard, most of them did not start off that way. Most started with many competing approaches. What forces led them to a single standard?
  2. Who won and who lost in moving to a single standard? Who decided to make the move?
  3. In the cases where there are multiple, incompatible standards, is there a market demand for unified standards? Why or why not?
  4. If a government decree came down today and mandated a single standard in those areas, what would be gained? What would be lost?
I hope you will continue on with reading Part II.

Labels:

Tuesday, March 13, 2007

Fast Track. Wrong Direction.

The idea was to make the C++ programming language work better in Microsoft's .NET framework. It started off as the Managed Extensions for C++, first available in 2000, and later in Visual Studio .NET 2003. Managed Extensions were reformulated in Visual Studio 2005 where they were called C++/CLI, referring to the Common Language Infrastructure, the runtime abstraction in .NET.

CLI itself had earlier been standardized in Ecma (approved in 2000) and Fast Tracked through ISO (approved in 2001). So, it was not much of a surprise when the C++ variant for Microsoft's .NET Framework, C++/CLI, was proposed for standardization as well. Ecma TC39/TG5 started work on C++/CLI in December 2003 and Ecma approved the specification as Ecma-372 in December 2005. Two years in committee, resulting in a 304-page specification. This used to be considered a fast pace.

After approval by Ecma, C++/CLI was submitted for Fast Track processing to ISO/IEC JTC1/SC22 as DIS 26926. Like any other Fast Track in JTC1, this process started with a 30-day contradiction period. Contradiction submissions were made by both Germany[pdf] and the UK[pdf].

The UK's position was that calling the standard "C++/CLI" would cause, and in fact was already causing, confusion among users with the already existing C++ programming language. The name of the standard was unacceptable:

We consider that C++/CLI is a new language with idioms and usage distinct from C++. Confusion between C++ and C++/CLI is already occurring and is damaging to both vendors and consumers.

A new language needs a new name. We therefore request that Ecma withdraw this document from fast-track voting and if they must re-submit it, do so under a name which will not conflict with Standard C++.

Similar views were expressed by Germany:

With reference to §13.4 of the JTC1 Directives, 4th edition, DIN brings to the attention of the JTC1 secretariat that we perceive a contradiction between document JTC 1 N 8037 "30 Day Review for Fast Track Ballot ECMA-372 1st edition C++/CLI Language Specification"and the JTC1/C++ standard ISO/IEC 14882:2004 "Programming language C++" and related technical reports.

We propose that the document is input into SC22 as a regular New Work Item Proposal and assigned to WG21 for further processing.

Ecma responded[pdf] to these objections in a 5-page letter, on 29 January 2006, that refused to make even the most basic concession, such as changing the name to remove the C++ reference.

So the objections are ignored, and they move on to the 5-month ballot period, starting March 9th, 2006. When the ballot closed in August, and the votes were counted, C++/CLI had received 11 out of 20 P-Member votes (55%) and a total of 9 negative votes out of 26 total votes cast, or 34.61%. So it failed both to get the required 2/3 approval of P-Members, as well as to keep the negative votes to less than 25%.

Germany and the UK voted disapproval. No surprise there, since they had objected early in the process, and their objections were ignored. In fact one of Germany's comments in the ballot was:

DIN has commented before, as well as BSI did, that allowing fast-track standardization of the "C++/CLI Language" under this name clearly conflicts with an existing and actively maintained standard: ISO 14882 - the C++ Programming Language. The document under review spells out under "NOTE FROM ITTF", bullet 2.2, that ITTF will ascertain that this proposed standard does not conflict with any other International Standard but such a conflict was pointed out. No reason has been given why this objection was overridden. Thus, DIN wants to express its surprise that standardization of this proposal went forward.

The US comments included:

The proposed standard is not market driven, nor is it the product of an industry consensus.

We are unimpressed with the very low level of C++ community participation mustered in the design and refinement of the current document, and feel, quite frankly, that the current state of this document is not at a high enough level of technical excellence to merit the ISO imprimatur.

France said:

This document should be withdrawn from the fasttrack approval process pending re-drafting and a more adequate review prior to voting. Better yet, retain it as an Ecma standard only until a clear market consensus develops that a JTC1 standard in this area is needed.

And so on, down the list.

It should be noted that a failing vote in the 5-month ballot is not necessarily fatal. The Fast Track submitter, in this case Ecma, can call on the SC Secretariat to convene a Ballot Resolution Meeting (BRM), where the issues can be discussed and resolved, possibly leading to a positive vote after a further ballot. This is Ecma's right as a Fast Track submitter. However, C++/CLI did not see a ballot resolution meeting. The JTC1 Secretariat recently notified SC22 members:

We have been advised that the comments accompanying the Fast Track ballot for DIS 26926 are not resolvable and that holding a Ballot Resolution Meeting (BRM) would not be productive or result in a document that would be acceptable to the JTC 1 National Bodies. Therefore, our proposal is to not hold the BRM and to cancel the project.

So, the BRM which had been scheduled for April, 2007 has been canceled, and that's where it stands today, with the attempted Fast Track of C++/CLI dead from seemingly easily preventable flaws.

Lessons, anyone?

Don't ignore NB members. If they take the time and make the effort to point out your flaws early in the process, then you should count yourself lucky. This is like the school teacher walking around the classroom during a quiz and pointing to one of your answers and saying, "You might want to take another look at that problem". If you ignore her advice and just turn in your paper, then you deserve the grade you get.

It is instructive as well that although only two NB's objected in the C++/CLI contradiction period, this grew to a far larger number by the time the 5-month ballot had ended. Ignoring problems doesn't make them go away.

One last thing. Any guesses on how long those contradiction arguments stay online before they are taken down to preserve the shrouded secrecy of ISO process? I advise you to make a copy now. I certainly have.

Labels:

Tuesday, March 06, 2007

Document Migrations

If you've been around this business for a while, you've seen your share of migrations. New operating systems, new networks, new hardware, even new document formats. I'd like to share some recollections of one such migration, and then some suggest a solution.

In 1995 I was working at Lotus on Freelance Graphics, along with many others, getting SmartSuite ready for Windows 95. One day, as I walked to work and rounded the corner of Binney Street, I saw something unusual, even more unusual than the usual unusual one sees in Cambridge. Something was up. There were news vans parked in front of LDB, camera crews and reporters looking for comments, Lotus security videotaping the reporters asking for comments, and me standing there, clueless.

This was how I first heard of IBM's take-over offer. It was hard to concentrate on porting to Windows 95 with all that news going on downstairs, but we managed.

In the weeks and months that followed there were many changes. At Lotus we were 100% SmartSuite users. No surprise there. Most of us did not even have a copy of Microsoft Office on our machines, unless we worked on file compatibility. Not only did we use SmartSuite for our collaborative work, creating and reviewing specifications, giving presentations, etc., we also ran some of our business processes on it. In particular we used an expense report application, done in 1-2-3 with LotusScript.

But IBM used Microsoft Office. So when IBM took over, we needed to migrate. Sure, there was whining and moaning and gnashing of teeth on our end about having to move to an inferior product. And it did take a little while to get accustomed to the different conventions of Office, typing AVERAGE() in Excel, rather than @AVG() in 1-2-3 and stuff like that. But we did it. We moved to Office. It was clear to all that the benefits of having a single file format outweighed the short-term pain on migration.

It is interesting what we did not do:

  1. We did not go and convert all existing legacy SmartSuite documents into Office format. What would have been the point? Most old documents are never touched again. Let them rest in peace.
  2. We did not delete SmartSuite from our hard drives. We kept the application there for cases where we needed to access old documents.
  3. We did not simply continue using SmartSuite and tell it to save in Office format. We knew that both fidelity-wise and performance-wise it is far better to use an application that supports a format natively than to rely on conversion software for interoperability.
  4. We did not translate 1-2-3 macro-based applications into Excel macro-based applications. We took the opportunity to move straight to web based applications. Aside from some standard presentation templates and similar boiler-plate templates we did not do a lot of conversion work.
Looking back in retrospect, the migration of file formats was one of the least contentious changes that accompanied the IBM takeover. We can handle file format changes, but eliminating the traditional Friday Beer Cart, now that was something to complain about...

I'm not much of one for committing unprovoked acts of methodology, but if I had to summarize what little wisdom I have in this area, I'd say that for a migration you want evaluate your existing documents by three criteria: stability, complexity and business criticality, and develop a migration plan based on that.

In the first case you classify documents by how stable (unchanging) they are:
  1. Hot documents — the documents that are being heavily changed and edited today, works-in-progress, in active collaborations
  2. Cold documents — the documents which are no longer edited, though perhaps they are still read. Many of these documents may have zero value and are just taking up space. Others may be valuable records, but hidden away on someone's hard-drive.
  3. Warm documents — These are the ones that are in the middle, not seeing heavy activity, but they aren't quite frozen either.

From the perspective of complexity we have:
  1. Low complexity — simple text and graphics
  2. Medium complexity — using more advanced features, created by power users
  3. High complexity — "engineered documents", using scripting and macros to create applications.
Finally you can also look at these documents from the perspective of business criticality. Of course, this will vary according to your business. It might be relevance to ongoing litigation, it might be according to a records retention policy, it might be whether it concerns currently open projects, etc. But for sake of argument, let's take client or public exposure as a proxy for criticality, so we get this:
  1. Internal use documents — internal presentations and reports
  2. Customer facing documents — engagement reports, proposals, etc.
  3. Publication ready documents — white papers, journal articles, etc.
These three dimensions — stability, complexity and criticality — can be combined, creating 27 different document classes. For example, our old expense report based on 1-2-3 macros would be classified as a hot, high complexity, internal use document.

So you are transitioning from Office legacy binary formats to ODF. What do you do with each of these document classes? You have four main strategies to consider:

  1. Do nothing and preserve the document in the legacy format, maintaining, as needed, access to the legacy application.
  2. Convert document to a portable high fidelity static representation, like PDF
  3. Convert directly to ODF.
  4. Reengineer as something other than a document.

So one migration policy might look like this:


Stability
Complexity
Exposure
Strategy
Cold
Low
Internal Use
Do nothing
Cold
Low
Customer Facing
Do nothing
Cold
Low
Publication Ready
Do nothing
Cold
Medium
Internal UseDo nothing
Cold
Medium
Customer FacingDo nothing
Cold
Medium
Publication ReadyDo nothing
Cold
High
Internal UseDo nothing
Cold
High
Customer FacingConvert to PDF
Cold
High
Publication ReadyConvert to PDF
Warm
Low
Internal UseConvert to ODF
Warm
Low
Customer FacingConvert to ODF
Warm
Low
Publication ReadyConvert to ODF
Warm
Medium
Internal UseConvert to ODF
Warm
Medium
Customer FacingConvert to ODF
Warm
Medium
Publication ReadyConvert to ODF
Warm
High
Internal UseConvert to ODF
Warm
High
Customer FacingPublish as PDF
Warm
High
Publication ReadyPublish as PDF
Hot
Low
Internal UseConvert to ODF
Hot
Low
Customer FacingConvert to ODF
Hot
Low
Publication ReadyConvert to ODF
Hot
Medium
Internal UseConvert to ODF
Hot
Medium
Customer FacingConvert to ODF
Hot
Medium
Publication ReadyConvert to ODF
Hot
High
Internal UseReengineer
Hot
High
Customer FacingReengineer
Hot
High
Publication ReadyReengineer


There may be a better way of expressing this above (Karnaugh maps anyone?) but that gives the idea. Also, I'm not suggested that this is the "one true answer", but merely that this may be a useful way of framing the problem.

Variations might include:


Much of this lends itself to automation. For example:

  1. First you need to find all of the documents in an organization. This could be done by an activeX control on a page everyone in the company visits, an agent that spiders the intranet web pages and file servers, etc.
  2. Each document is then scored.
  3. Finding the stability of a document could be done by looking at the last read and last write stamps on the file. Also can look weblogs. Maybe even metadata in the document that tells how many times it has been edited.
  4. Complexity could be determined by scanning the document to see what features it uses. Some features, like script, would weight heavily for complexity. Think of it as a "goodness of fit" metric for how well the features used in the document fit within the ODF model.
  5. Business criticality is harder to automate, but could be done based on owner of the document, metadata in the document, location of the document (public web page versus intranet), etc.
  6. Calculate the scores, suggest actions to take, and then automate the action. This could lead to a nice automated migration solution.

In summary, it probably is not worth while simply to go out and convert all of your legacy documents in a giant cathartic orgy of document transformations. Not all documents are worth that effort. In any organization you probably have many many documents that will never be read again, ever. You also likely have some very complex documents that probably should be reengineered as web applications on your intranet. The other documents, the ones in the middle, that is where you focus your migration effort.

Labels:

Sunday, March 04, 2007

Compatibility According to Humpty Dumpty

‘I don't know what you mean by “glory,” ’ Alice said.

Humpty Dumpty smiled contemptuously. ‘Of course you don't — till I tell you. I meant “there's a nice knock-down argument for you!” ’

‘But “glory” doesn't mean “a nice knock-down argument,” ’ Alice objected.

‘When I use a word,’ Humpty Dumpty said, in a rather scornful tone, ‘it means just what I choose it to mean, neither more nor less.’

‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’

‘The question is,’ said Humpty Dumpty, ‘which is to be master - that's all.’

— Lewis Carroll from Through the Looking-Glass (1871)


I have written about Microsoft's language games previously. These games continue and it appears to be time for yet another inoculation. Words such as “open”, “choice”, “interoperability”, “standard”, “innovation” and “freedom” have been bandied about like patriotic slogans, but with meanings that are often distorted from their normal uses.

The aggrieved word I want to examine today is “compatibility”. Let's see how it is being used, with some illustrative examples, the ipsissima verba, Microsoft's own words:

From an open letter Interoperability, Choice and Open XML” by Jean Paoli and Tom Robertson:

The specification enables implementation of the standard on multiple operating systems and in heterogeneous environments, and it provides backward compatibility with billions of existing documents.

From another open letter, Chris Capossela's “A Foundation for the New World of Documents”:

... all the features and functions of Office can be represented in XML and all your older Office documents can be moved from their binary formats into XML with 100 percent compatibility. We see our investment in XML support as the best way for us to meet customers’ interoperability needs while at the same time being compatible with the billions of documents that customers create every year.

From Doug Mahugh: “The new Open XML file formats offer true compatibility with all of the billions of Office documents that already exist.

And from Craig Kitterman: “Is backward compatibility for documents important to you? How about choice?”

Those are just a handful of examples. Feel free to leave a comment suggesting additional ones.

Compatibility. Better yet, True Compatibility. What is that? And what do you think the average user, or even the average CTO, thinks, when hearing these claims from Microsoft about 100% compatibility?

Let's explore some scenarios and try to reverse-engineer Microsoft's meaning of “True compatibility”.

Suppose you get a new, more powerful PC with more memory and upgraded graphics card and you upgrade to Vista and Office 2007. You create a new presentation in PowerPoint 2007 and save it in the new OOXML format. What can you do with it?

Can you exchange it with someone using Office on the Mac? Sorry, no. OOXML is not supported there. They will not be able to read your document.

Is this 100% compatibility?

What about Windows Mobile? Can I read my document there? Sorry, OOXML is not supported there either.

Is this 100% compatibility?

What about sending the file to your friends using SmartSuite, WordPerfect Office or OpenOffice, or KOffice? They all are able to read the legacy Microsoft formats, so surely a new format that is 100% compatible with the legacy formats should work here as well? Sorry, you are out of luck. None of these applications can read your OOXML presentation.

Is this 100% compatibility?

What about legacy versions of Microsoft Office? Can I simply send my OOXML file to a person using an old version of Office and have it load automatically? Sorry, older versions of Office do not understand OOXML. They must either upgrade to Office 2007 or download and install a convertor.

Is this 100% compatibility?

I have Microsoft Access XP and an application built on it that imports data ranges from Excel files and imports them into data tables. Will it work with OOXML spreadsheets? Sorry, it will not. You need to upgrade to Access 2007 for this support.

Is this 100% compatibility?

What about other 3rd party applications that take Office files as input: statistical analysis, spreadsheet compilers, search engines, document viewers, etc. Will they work with OOXML files? No, until they update their software your OOXML documents will not work with software that expects the legacy binary formats.

Is this 100% compatibility?

Suppose I, as a software developer, takes the 6,039 page OOXML specification and write an application that can read and display OOXML perfectly. It will be hard work, but imagine I do it. Will I then be able to read the billions of legacy Office documents? Sorry, the answer is no. The ability to read and write OOXML does not give you the ability to read and write the legacy formats.

Is this 100% compatibility?

So, there it is. A don't know if we're any closer to finding out what “100% compatibility” means to Microsoft. But we certainly have found lot of things it doesn't mean.

A quick analogy. Suppose I designed a new DVD format, and standardized it and said it was 100% compatible with the existing DVD standard. What would consumers think this means? Would they think that the DVD's in the new format could play in legacy DVD players? Yes, I believe that would be the expectation based on the normal meaning of “100% compatible”.

But what if I created a new DVD Player and said it supported a new DVD format, but also that the Player was 100% compatible with the legacy format. What would consumers think then? Would they expect that the new DVD's would play in older players? No, that is not implied. Would they expect that older DVD's could be played in the new Player? Yes, that is implied.

This is the essence of Microsoft's language game. The are confusing the format with the application. This is easy to do when your format is just a DNA sequence of your application. However, although Microsoft Office 2007, the application, may be able to read both OOXML and the legacy formats, the OOXML format itself is not compatible with any legacy application. None. The only way to get something to work with OOXML to write new code for it.

This is not what people expect when they hear these claims of OOXML being 100% compatible with legacy formats.

Labels:

Thursday, March 01, 2007

OASIS Symposium and OpenDocument Workshop

OASIS will have its annual Symposium April 15th-17th in San Diego, with the theme, "eBusiness and Open Standards: Understanding the Facts, Fiction, and Future". It should be noted that this is not a real symposium, where guests recline in couches, drink wine and discuss philosophy to the accompaniment of flute-girls. On the other hand, it will have a lot of ODF, which is almost as good.


Bob Sutor will give the opening keynote. Scott Hudson will give a talk on, "DocTape: A Document Standards Interoperability Framework for DocBook, DITA, ODF and more!". I'll be joining a panel on Tuesday looking at ODF Interoperability and related topics. And Wednesday will be a half-day Workshop on ODF, with presentations on adoption, programmability, accessibility, interoperability and future directions.

Then back home on Thursday, my birthday. This gives my wife the rare opportunity to get a large present into the house without me noticing. Hint, hint...

Labels:

This page is powered by Blogger. Isn't yours?