Rob

Low-Fat ODF

2009/03/03 By Rob 2 Comments

Jack Sprat could eat no fat.
His wife could eat no lean.
And so betwixt them both, you see,
They licked the platter clean!

Is dietary fat good? Or is it bad? Without getting into a discussion of saturated versus unsaturated fats, or the virtues of omega-3 oils, let me make a few basic, reasonable observations:

Individuals differ in their preferences and requirements for fat intake. There is no single answer for all people at all times.
Experts differ in their recommendations for fat intake.
Standards exist for how to measure and report the fat contents of food products.
Standards also exist for the specific conditions under which a vendor may call their food products “low fat” or “light” or “fat -free”. For example, “low fat” products must have 3g or less fat per serving.
The government requires vendors of retail packaged food to label the fat content in accordance with standards #3 and make only claims regarding fat content that conform with standards #4.

The above system generally works. Food vendors have the freedom to add as much fat as they want to their products. If they want to sell deep -fried bacon-wrapped cheese, then fine. No problem. It is a free country. But this is balanced by the consumer’s ability to know the fat content of the products that they purchase. This gives control to the consumer, allowing informed choice.

But take away the standards, take a way the reporting requirements, and the manufacturer has all of the control. Let’s imagine a world where there were no such fat content standards. Medical research would still progress and the long-term dangers of high-fat diets would still be known. But the consumer’s ability to control their fat content would vastly reduced. There would be no informed choice.

Imagine further that Company A, observing the medical research and consumer interest in healthy food, decides to offer a low-fat cheese. But if Company A sells their low-fat cheese, the label “low fat” itself would have no formal meaning. In this hypothetical, there are no standards. Nothing prevents Company B and Company C from also advertising their existing cheeses as “low fat”. Without standards there is no differentiation. Since consumers have no effective way to test the fat content of cheese on their own, they are at the mercy of the non-verifiable claims of vendors and the advertising agencies. Because there are no acknowledged standards for fat content, the market for low-fat cheese is stunted. The consumer does not benefit and the innovative Company A does not benefit. No one wins.

This is a general concern for markets where the consumer cannot directly verify the quality of the goods, because they are packaged and inaccessible to inspection, or because the consumer lacks the technical ability to determine the quality themselves. From fat content to auto gas mileage efficiency, this leads to standards for measuring and reporting qualities of interest to consumers.

So back to reality. We do have fat content standards, for measurement and reporting. Suppose that Company A sells its low-fat cheese and it is very popular, because it is what the consumer wants. Company B is envious of the higher margins on low-fat products, but it would take too long for them to revamp their production line to make a cheese with 3g or less fat per serving. They can only get it down to 5g per serving. What can they do? Well, they can hire a lobbyist, go to Washington, DC, and spread some influence around. They could try to get the FDA to change their definition of “low-fat” so it includes their higher-fat products as well. If you can’t change your product to meet the standards that consumers want, then dumb down the standards!

Sound far-fetched? This is actually happening all the time with certified organic food in the United States. Non-organic ingredients are routinely being allowed in organic food products based on requests from big food manufacturers. The consumer has very little visibility or voice in this process.

So what does this all have to do with ODF? Fair question. The analogy is to extensions of ODF, a topic currently being hotly debated on the OASIS ODF Technical Committee. Extensions are additions to an ODF document which are not defined by the ODF standard. They may be proprietary vendor extensions, or extensions using other open standards. But regardless, since their use in an ODF document is not defined by the ODF standard, they are difficult or impossible to use in an interoperable fashion, at least by those who do not know the secret details of the extension. However, such extended documents may be immensely useful in some situations.

So are extensions good? Are they bad? Are you more concerned with interoperability? Or with a particular use that requires the extension? There is no single answer for all people at all times. Because of this, it is important to put control firmly in the hand of the consumer of ODF products, so they can make the appropriate choice for themselves.

Similar to the mechanism of food labeling, putting control in the consumer’s hands requires that we:

Have a formal definition of what an extended ODF document is versus an unextended ODF document.
Have something like a reporting requirement, so it is clear to the consumer whether a particular document is extended or not.

The proper pace to address these points is in the conformance clause of the ODF Standard. To that end, the current draft of ODF 1.2 defines two conformance classes, one for extended documents and one for unextended documents. The aim, in the end, is to give the consumer greater control and allow them to make a more intelligent choice. We can’t force vendors to implement one or the other conformance class. And we can’t force consumers to use one or the other. But we can formally define what an extended document is and let the free market operate based on the additional information made available.

This is a small step and I know it doesn’t sound like much, but even this modest step provoked such a paroxysms on the ODF TC that you would have thought I was splashing holy water at an exorcism. I suspect this means that I must be doing something right!

ODF Spreadsheet Interoperability: Theory and Practice

2009/03/01 By Rob 9 Comments

This is a follow up to some work we did at the ODF Interoperability Workshop in Beijing last November. We had good participation there: IBM, Sun, Google, Novell and Redflag from the big vendor side, as well as a good number of users. It was a full-day workshop and we covered a number of topics. One of them was spreadsheet formulas. I gave a short presentation on spreadsheet interoperability, specifically on the work we’ve done on OpenFormula for ODF 1.2. We also did a short exercise to look for spreadsheet formula bugs.

As many of you know, neither ODF 1.0 nor ODF 1.1 defines a spreadsheet formula language. They leave it implementation-defined. The specification makes only a few broad statements, such as a recommendation that formula attributes be qualified by namespace, that formulas begin with ‘=’ , that cell addresses be surrounded by ‘[‘ and ‘]’ and that formula parameters be delimited by ‘;’. So in theory, this is a mess. But in practice it has worked out quite well, since implementations have played “follow the leader” and have nearly converged on interoperable spreadsheet formulas. With ODF 1.2, we’ll standardize the consensus on spreadsheet formulas, giving even greater certainties.

Let’s see how this works in practice. I created a simple spreadsheet document in several ODF-supporting applications, including Microsoft Office using the various plugins. Here is what I tested:

Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
Google Spreadsheets
KOffice’s KSpread 1.6.3
Lotus Symphony 1.1
OpenOffice 2.4
Microsoft Office 2003 with Sun’s ODF Plugin

I used what I had installed on my two machines, Windows and Ubuntu. There may be updates to some of these applications that do even better.

I created the same basic spreadsheet from scratch in each editor and saved it as ODF format. I then looked at each document to see how formulas were being stored in the XML:

CleverAge stores it in the OpenOffice namespace (xmns:oooc=”http://openoffice.org/2004/calc”)
Google also uses the OpenOffice namespace.
KSpread doesn’t use namespace-qualified formula attributes.
Symphony also doesn’t use namespace-qualified formula attributes.
OpenOffice uses the OpenOffice namespace.
Sun’s Plugin also uses the OpenOffice namespace.

OK. So there is some variation in how the formulas are stored, with two approaches in use. How does this then impact interoperability? In theory it is horrible. In practice it works out pretty well.

I took each of the 6 spreadsheet documents and opened each one in each of the other 5 applications — 30 interoperability tests — to see whether the formulas were loaded and calculated correctly. Here is what I saw:

		CleverAge	Google	KSpread	Symphony	OpenOffice	Sun Plugin
		Created In
Read In	CleverAge	OK	OK	Fail	Fail	OK	OK
	Google	OK	OK	OK	OK	OK	OK
	KSpread	OK	OK	OK	OK	OK	OK
	Symphony	OK	OK	OK	OK	OK	OK
	OpenOffice	OK	OK	OK	OK	OK	OK
	Sun Plugin	OK	OK	OK	OK	OK	OK

So the formulas came through OK, in almost all instances. The only exception was the CleverAge add-in, which failed to process formulas from KSpread and Symphony. For example, loading the Symphony spreadsheet into Office 2003 results in cells with contents containing errors such as “=#REF!+#REF!-#REF!” which is tantamount to data loss.

I think we can do better than this with a few simple changes.

The Law of Robustness as stated in RFC 1122 is “Be liberal in what you accept, and conservative in what you send.” Adapting that principle to ODF spreadsheets, I recommend the following practice for ensuring interoperability using ODF 1.0 and ODF 1.1:

When writing ODF 1.0 or ODF 1.1 spreadsheet documents, write formula attribute values using the OpenOffice namespace prefix: “http://openoffice.org/2004/calc”. All ODF spreadsheet applications I have tested accept and correctly process formulas in that namespace. Note that the CleverAge add-in is not doing the namespace checks in a XML-correct fashion. They are comparing only the text of the prefix, not resolving it to a namespace URI and comparing the URI’s. So you should be sure to also use “oooc” as the namespace prefix.
When reading ODF 1.0 or ODF 1.1 spreadsheet documents, be prepared to handle formulas with no namespace qualification as well as those with the OpenOffice namespace.

Specifically, Symphony and KSpread should consider making changes to accommodate #1 and CleverAge should consider changes needed to do #2. In the CleverAge case, a trivial, one-line change to OdfConditionalPostProcessor.cs will quickly restore compatibility with Symphony and KSpread documents.

Now, if you are entirely satisfied with what I have said above, and have no lingering doubts, then you are not thinking enough. It is not enough to merely bring the spreadsheet formulas over intact. Interoperability also requires that we interpret the formulas in the same way.

So let’s look at that side of the equation (no pun intended). Fortunately, we are all quite close to what is being defined in ODF 1.2’s OpenFormula specification. This is not so surprising, since OpenFormula was based on actual spreadsheet practice, looking at a variety of spreadsheet applications. I did a quick test of the 6 ODF spreadsheet applications to see how well they fared against a test suite of 509 core tests that OpenFormula defines for spreadsheet functions. The results were:

CleverAge 455/509 = 89%
Google 457/509 = 90%
KSpread 472/509 = 93%
Symphony 487/509 = 96%
OpenOffice 493/509 = 97%
Sun Plugin 500/509 – 98%

So, we’re not yet perfect, but we’re getting pretty close. Interestingly, the lowest scores (CleverAge) and highest scores (Sun Plugin) are both for the same calculation engine (Excel).

Looking forward, we’ll continue to edit and refine OpenFormula and its test cases. You might look for it when it comes out for public review, hopefully in a couple of months. Unlike other parts of ODF 1.2, OpenFormula is essentially XML-free. It is a mini-expression language, defined by a BNF grammar and accompanied by hundreds of spreadsheet functions from mathematics, finance, engineering, statistics, etc. So review by subject matter experts in these disciplines is especially needed, even if they have zero XML experience. If you want to see the current OpenFormula Working Draft, currently in its 71st revision, take a look. Comments may be submitted to the ODF TC’s comment list.

I’m also looking forward to testing Office 2007 SP2’s ODF support when it comes out, to see how their ODF support is improving. Anything less than the 500/509 results that Excel 2003 gives with the Sun Plugin will be a disappointment. KOffice has a 2.0 version in beta I should look at. OpenOffice has their 3.0 update. Sun also has an updated ODF Plugin. I’ll lean on the Symphony team as well, and see if we can beat 500/509. Game on!

Being social

2009/02/28 By Rob 4 Comments

By nature I am an introvert. I don’t schmooze. I don’t “network”. Like Sartre, I am firmly in the “Hell is other people” camp. However, since social and collaborative computing is large part of what we work on at IBM, and we’ve recently signed deals with LinkedIn and Skype, I’ve decided to jump in with both feet and see what value these and other social networking and communication services have to offer.

Certainly, within IBM, I’m constantly typing into Sametime. I wouldn’t be surprised if I exchange more internal information, counted by characters, in instant messages, than I do in emails. However, in my external communications, both professional and personal, it is almost entirely via email for 1-to1 communications, and this blog for broadcasts. I’d like to experiment a bit and see what other tools and services are effective. This isn’t a long term commitment to being social, but a experiement. We’ll see how it goes.

So, I’ve put up my contact information for various social sites on my Who is Rob Weir? page. Feel free to contact me via these services. Also, I’d be interested in what other services you think I should be looking at.

Whither ODF?

2009/02/25 By Rob 23 Comments

Whether ODF will wither or weather
depends on us as we work together.

The question is where we should go: whither?
The answer is clear at once.
The question of “whither” is not so dense,
and is easy to answer when we start with “whence?”.

Of the topic today
I will no longer delay nor dither to say
whether we will whither or weather
but will now give you my 2-cents.

Rob’s ODF-Next Rant

The word processor and spreadsheet, as we have them today, are relics of the 1980’s, designed when the web did not exist and collaboration occurred predominantly by exchanging paper documents. If we were designing a document author and collaboration system to meet modern circumstances and capabilities, it would likely bear little resemblance to Word. So the question is how much do we let the sunk costs of yesterday continue to determine our future? How much longer do we paint speed stripes on a horse and pretend that it is a racing car?
Products like Word and Excel have evolved via the uncritical accretion of functionality over the past decades to a point where the products are overly complex resource gluttons with a knack for having a critical security flaw reported in them every other week.
Increasingly users are getting work done via email, wikis and blogs rather than using heavy-weight document editing solutions. Why is this so? Why is the modern word processor losing users rather than gaining them?
WYSIWYG is a fine paradigm if you are doing all of your work targeting printed output. But it is a sub-optimal approach for creating documents for almost any other use.
The revered Bold, Italics and Underline icons, along with the font selection drop down list, which define the modern editor GUI, should be forcibly removed from the user interface, stripped of rank, and put on trial for crimes against productivity. You are writing a document, not decorating a cake. You need to ask yourself “Why should this text be italics?” Is it a book title, a foreign phrase, a name of a movie, the name of a legal case? Then choose a named style that indicates why that text is special. Let the named style take care of how it is displayed.
Unless you are designing a poster for a modern art gallery you should stick to the named styles in your template. Power users might define additional named styles. But direct application of random attributes to random text selections should be considered a form of data corruption.
Few documents today are ever printed. The are born, live and die entirely in digital form. We should be optimizing for the most common cases, not just for what our parents or grandparents did with WordPerfect 1.0.
The most common sources of reused content come from other documents and from PDF and from HTML. Current cut & paste mechanisms today make a mess of styles. Paste in the content with the styles of the source document? According to the styles of the destination document? Mapping to the nearest local style? All are wrong answers. The only correct answer is to give me the choice.
PowerPoint is pure evil. It has elevated form over substance and turned every form of business communication into a “pitch”.
I should be able to call spreadsheet functions using named parameters, like PV(rate=1%,periods=12,payment=$1000.00) rather than PV(0.01,12,10000) so my model is self-documenting and avoids errors from incorrect ordering of parameters.
Security needs to be designed into the document authoring environment, including the format, not patched on as an afterthought.
I want Greasemonkey for my word processor and my spreadsheet.
Connections between documents may be as important as the documents themselves.
The less control the user asserts over the appearance of a document during editing, the more flexibility he or she has over the final published appearance. In today’s multi-modal, multi-device world, it is essential that we do not prematurely commit our documents to a particular rendering. We need late binding of presentation to content, not early binding. If we had done this for the past decade, we would have perfect interoperability today between all word processors. If we start doing it now, we will have perfect interoperability among word processors going forward.
Spreadsheets should have functions that access web-based data stores for common financial, economic, political and scientific data sets. Mathematica does something similar, presumably using local caching.
Presentation should be a mode of displaying another document, not just document type itself. For example, I should be able to take a report and push a button to enter a slide-show mode, where all images are shown as slides, with their captions, and each top level section header becomes a slide with 2nd level headers as bullet items. During the presentation I should be able to seemlessly drill down into the real document.
I want to be able to share data ranges, text ranges and presentation slides with others and to subscribe to theirs via feeds. I rarely write a document from scratch. Reuse, reuse, reuse. But the tools only support this at a scavenger level.
We lack high level support for the compositing or assembling a document from fragments. Once I cut & paste, my new docment has lost all knowledge of the document I copied from. This is great if I am a professional plagiarist. But it is bad if I am a CIA analyst and my report has copied information claiming uranium production in Africa, and I never know when that information is repudiated, and I pass my flawed report onto the President. Very bad. When I cite an authority for an argument, my argument is only as good as the authority. I owe it to myself and my readers to make it easy to know whether the information I cited is still accurate and vouched for by that authority.
Current tools are impoverished when it comes to the social side of documents. Review/comment reflects old, hierarchical thinking and doesn’t scale to the network. How can I have 100 people comment on my document? What if I want 100 people to jointly author a document? The Wiki knows where Word cannot go…
Most user woes in modern word processor are caused by our attempts to remain compatible with the design choices made by Microsoft Office developers 15 years ago. It is time to move on and learn from past mistakes, but not perpetuate them.
I want to use the same text editor to edit documents, web pages, emails, blog posts, discussion forums and wikis. Why do I need a different brand hammer for every nail?
I want a spreadsheet function that can call a web service. It might lookup a book title by ISBN, do currency conversions, or geocode data. There should be thousands of such spreadsheet functions, backed by web services, interoperable based on standard protocols. Some might be free, others fee-based. Some might be both, e.g., 20-minute delayed quotes for free, real-time for a fee.
Spreadsheet functions express a core analystic function and should be usable in all tables, in word processors and presentations, not just in spreadsheets. They should also be usable in fields in forms and in text passages.
The inability of word processors to output clean, readable and valid HTML or XHTML should be an embarrassment to all vendors.
HTML + JS + XHR + HTML DOM = AJAX. ODF + JS + XHR + ODF DOM = ?
We must define power as in “power user” based on results, on productivity. Power is as much about what a system allows you to ignore as what it allows you to control.
Today trust is based on digital signatures and classical questions of authentication, integrity and non-repudiation, all backed by a chain of trust traceable back to some well-known certification authority. In some contexts, this hierarchical, binary view of trust is adequate. But the network sees trust based on reputation, rating, scoring, voting, reverse citation counts and other non-hiearachical values. How do we account for these?
Spreadsheets are unnecessarily dangerous, based on a muddled view of data types which leads to silent errors and inconsistencies. This might have made sense in the memory and processor constrained systems of the 1980’s. But today, with our better sense of the errors and the cost of errors, we need a spreadsheet system that is type-safe, aware of measurement units, and which enforces consistency and accuracy. We obviously can’t prevent someone from making a stupid spreadsheet model for subprime mortgages, but we can at least ensure that they don’t make stupid cut & paste errors when creating that model.
Spreadsheets should have instrinsic support for image, sound and geographic data. Not just embedded media, but as an intrinsic data type, so a function could take an image as input, or return an audio clip as a result.
A grid in a spreadsheet provides a logical addressing scheme as well as a visual layout scheme. But what if I want the former without the latter? Why can’t I do a spreadsheet calculation in a text document? Why am I always stuck in in a grid?
Spreadsheets should have built-in support for sensitivity and risk analysis, perhaps via monte carlo methods. Yes, I know support is available via 3rd party plugins, but this should be a core feature in the repetoire of every user. We might not be in the global financial mess we’re in now if spreadsheet users all could easily “stress test” their models.
The Holy Trinity of Word/Excel and Powerpoint is only a convention, mainly enforced by Microsoft’s definition of their office suite. It is not a law of nature. Other applications types should be considered to be part of the core desktop authoring environment, such as project management and mind maps.
Outliners and other pre-draft tools have lagged far behind the core editing functions of a word processor. And what is the equivalent of an outliner for a spreadsheet?
Microsoft is as much a prisoner to the predominent model of end user producitivty as the user is. Their need to support legacy documents constraints their freedom of action and has contributed to the relative lack of innovation in Microsoft Office over the past decade.
An editor should allow a user to verify interoperability as easily as it lets them do a print preview.

Looking for Good Ideas for ODF-Next

2009/02/22 By Rob Leave a Comment

A typical team project, whether software, standards, bridge construction or what have you, has a slow start dominated by a planning and scheduling, a middle period of execution, and an finish with final frantic rush of activity to complete the project. Then everyone takes a few days off and we start again.

One thing I learned early in my career was how wasteful this kind of project cycle is. The problem is that not everyone is involved in every part of the project. Some only work on planning, some only on execution, and some mainly come in at the end. This leads to suboptimal allocation of resources. People are standing around waiting.

One solution, not necessarily the only one, is to work on multiple versions of a project at once. For example, when working on a software application, you can take 25% of your team and have them start the planning phase of version N+1 while the remaining 75% of the team completes the final QA stage of version N.

We have a similar issue with standards development. Both the OASIS and the JTC1 PAS process involve a lot of standing around waiting: at least two months of public review in OASIS, and 6 months of review in JTC1. And even now, as we complete the editing work on ODF 1.2, the wider ODF community is standing around waiting. It is too late to make feature proposals for ODF 1.2, but too early for a full public review of the ODF 1.2 draft.

What is to be done?

The ODF TC has decided to begin activities on the next version of ODF, called for now “ODF-Next”, even before we have ODF 1.2 approved. Although we obviously won’t be spending a large amount of time on that effort quite yet, since we really are all busy with ODF 1.2, we have come up with a way to engage the broader community and have you help us gather requirements for ODF-Next now, which we can then consider during the downtime when ODF 1.2 is under review in OASIS and JTC1. The Call for Proposals for ODF-Next went out on Friday.

So put on your thinking cap. ODF 1.1 and ODF 1.2 were incremental releases. Maybe ODF-Next will be bolder, maybe something that shifts the paradigm, pushes the envelope, breaks out of the box. Is the dominant WYSIWYG word processing paradigm the final word in user productivity? Or are we overdue for a change, for a different set of priorities? As Thomas Paine wrote, “We have it in our power to begin the world over again.”

Now is the time to start collecting the ideas, big or small, and submit them to the ODF TC according to the instructions in the Call for Proposals linked to above.

We’ll be collecting ideas at least until March 31st. The Requirements Subcommittee will then sort through the ideas, categorize and prioritize them, and generally try to make sense of it all, and then write up an ODF-Next Requirements document with their recommendations.

This is a good chance to get your ideas in early and have a real impact on where we go with ODF in the next major release. But please, do not give me ideas via blog comments. We can only accept ideas sent through the above linked OASIS comment submission procedure, which is necessary to ensure that ODF remains an open standard that anyone can implement. IANAL, but I believe an added benefit is that any idea you submit, even if speculative, even if not added to ODF-Next, will be permanently archived in the ODF comment list, and thus will establish prior art which could scuttle attempts to secure patents in this area. So by contributing your ideas publicly in this way, you help to establish an intellectual commons that will benefit free and open source applications in this area.

Please pass along the word. We’re hoping to get 100’s of ideas for ODF-Next. Bring it on!