• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for Rob

Rob

ODF enters the Semantic Web

2007/10/12 By Rob 11 Comments

Metadata is “data about data”. Meta from the Greek, μετά, meaning with or after.  I suppose if you wanted to sound grand you could pronounce it hyper-correctly with the stress on the second syllable, met-ah’. I’ve heard some incorrectly pronounce it meet’-ah, perhaps a false analogy with βῆτα = beta. But you never hear anyone pronounce μέγα = mega as mee-guh, do you?

Metadata is not new. It has been around for centuries. In some cases metadata applies to the overall document, while in other cases it applies to only a portion of the content. Examples of the first case include titles of books, footnotes, ISBN numbers, LOC or Dewey Decimal categorizations, keywords, etc. The various forms of scribal marginalia, whether scholia or glosses in the margins of a manuscript, or personal annotations of the owner of a document, are historic examples of the second kind of metadata.

Marginal notes are frequently used today in business forms. A printed form represents, often imperfectly, a snapshot in time of an organization’s view of their own process. But maybe the process was was approximated or the form was imperfectly designed.  Maybe it quickly became outdated, but somehow reality seems to outgrow the strictures of the form’s blanks and checkboxes. So what do, as a customer, do? You write notes in the margins or other places between form fields and hope that there is a human in the loop to read your words.

In any case, of all documents, forms (originally called “formulary documents”) have the most structured representation of data. Enter your social security number into the nine little boxes provided. Enter your date of birth here, Month first, then day, then two-digit year. Last name first, first name last. Everything is nice and simple, and provided your reality matches that which the form designer envisioned.  Your data will be easy to consume, whether by another person or, after data entry, by various online processes. Or maybe the form data was entered online originally? Even better.

But what about all the other documents in the world, the ones that are not formally structured as forms? What sense can we make of them? Can you write a program to detect a social security number in a free-form document, or a date, or a zip code? Perhaps with pattern matching, you can find out some simple things. That is the essence of Microsoft’s Smart Tags. (And we had much of this in Lotus Agenda a decade earlier.) But this only works for the most trivial cases. It only takes you so far.

What if I wanted to markup an academic paper, a work-in-progress, to indicate which quotations have been verified and which ones remain to be be verified? Or what if I want to annotate statements in recorded testimony according to which statements contradict and which corroborate another witness’s statements? This goes far beyond pattern matching. I need a way to encode my knowledge, my view of the subject, my insights, into the document.

We have data in a document — “Words,words, words” as Hamlet tells Polonius. But for those who work with thoughts, the present constraints of encoding our knowledge as rudimentary linear strings of characters is severe. In general text is multi-layered and hyper-linked in strange and marvelous ways. Your father’s word processor and word processor file format are inadequate to the task. The concept of a document as being a single store of data that lives in a single place, entire, self-contained and complete is nearing an end. A document is a stream, a thread in space and time, connected to other documents, containing other documents, contained in other documents, in multiple layers of meaning and in multiple dimensions. What we call a traditional document is really just a snapshot in time and space, a projection into a print-ready format of what documents will soon become.

The applications of metadata to business documents are legion. Wherever you have data, you also have the questions of:

  1. Who entered the data?
  2. Where did the data come from?
  3. Who verified the data?
  4. Who approved the data? Legal? HR? Business?
  5. Where is this data destined?
  6. How old is the data? When does it expire?
  7. How trustworthy is this data?
  8. Who must we cite as an authority for this data?
  9. Who owns this data?
  10. Who has permissions to see this data?
  11. Who can set policy for this data?
  12. Who else can edit this data?
  13. How does this data connect with my business? Is it a part number? The name of a customer or the name of an employee?

And so on.

Open Document Format (ODF) 1.2 takes a step into the word of structured metadata with an RDF metadata framework. If that sounds Greek to you, then let’s say that a metadata framework enables application developers to create applications that do the above things. A framework doesn’t tell you how you must say “This image is provided under a Creative Commons Share-Alike license” but provides a framework for application developers to express concepts like “licensed-under” and “Create Commons Share-Alike”, as well a formal structure for expressing subject-predicate-object relationships, where the subject can be any of around 50 ODF document elements, such as paragraphs, footnotes, images, tables, etc.

To read more, here are some places to start:

For general background on the “semantic web”, a good intro is 2001 Scientific American article “The Semantic Web” by Tim Berners-Lee, et. al.

For a bit more on RDF, the wikipedia page is pretty good.

Svante Schubert at Sun, also on the ODF Metadata Subcommittee has a recent blog post worth reading: “New Extensible Metadata Support With ODF 1.2.

Bruce D’Arcus, of the Metadata Subcommittee and co-lead of the OpenOffice.org Bibliographic Project also contributes his thoughts on the new ODF 1.2 metadata.

If you want to delve into the particulars of ODF 1.2’s new metadata support, you can read the latest draft of the proposed changes to the specification [ODF] and the examples [ODF] document. Of course, any feedback on ODF drafts and published standards are welcome on the ODF TC’s comment mailing list.

For a gentle introduction to metadata, ODF, where we are coming from and where we are going, I offer this interview [MP3] with Patrick Durusau, Chair of the ODF Metadata Subcommittee, which I recorded back in July.

Filed Under: ODF

Cracks in the Foundation

2007/10/07 By Rob 14 Comments

You must admire their tenacity. Gary Edwards, Sam Hiser, and Paul E. Merrell (aka “Marbux”) . The mythology of Silicon Valley is filled with stories of three guys and a garage founding great enterprises. And here we have three guys, and through blogs, interviews, and constant attendance at conferences, they have become some of the most-heard voices on ODF. Maybe it is partly due to the power of the name? The “OpenDocument Foundation” sounds so official. Although it has no official role in the ODF standard, this name opens doors. The ODF Alliance , the ODF Fellowship, the OASIS ODF TC, ODF Adoption TC (and many other groups without “ODF” in their name) have done far more to promote and improve ODF, yet the OpenDocument Foundation, Inc. seems to score the panel invites. Not bad for three guys without a garage.

However, in recent months the OpenDocument Foundation has found itself more and more isolated, outside of the mainstream debate. How far they have fallen can be seen in the fact that Microsoft has gone from ridiculing their conspiracy theories to using them to support their arguments. At the same time the Foundation’s membership has dwindled to the point where only a small number remain. Former members have disassociated themselves from the Foundation as it turned increasingly to strident rhetoric. Whereas in the early days, the Foundation had a large membership that participated fully in the OASIS TC’s, now their “contributions” are mainly that of heckling and haranguing the other members. Finally, the Foundation has recently announced its intent to abandon constructive work within OASIS, to actively lobby against adoption of ODF 1.2 in ISO and to push for an alternative format, CDF, based on XHTML, CSS 3.0 and RDF. This is an odd stance for a non-profit whose charter was:

The OpenDocument Foundation, Inc. is a 501c(3) non profit chartered to work in the public interest to support, promote and develop the OASIS OpenDocument File Format affectionately known as “ODf”.

So it is against this backdrop that I read with interest in Linux Today the latest correspondence from the Foundation. You can read it yourself, or take the following 8 points from me as a condensed summary of their main points:

  1. “The commercialization of interoperability remains a key driver in both big vendor deals and big vendor consortia FOSS is left on the outside looking in.”
  2. The conversion to XML [document formats] must be nondisruptive” meaning it fits into existing business processes which are increasingly dominated by Microsoft middleware. This implies a requirement for high-fidelity, loss-less round-trip conversions.
  3. The alternative is “rip and replace” and that is too costly and disruptive.
  4. Microsoft is moving toward a “grand convergence” of their services, desktop, device and servers, with OOXML at the core. “MS-OOXML is the primary transport, the document/data container of interop-integration preference.”
  5. ODF was not designed as a response to these problems.
  6. Microsoft/Sun/Novell are working “to limit ODF interoperability and usefulness” because of some patent deals. (Sorry I can’t summarize this one better — I just don’t understand it.)
  7. IBM/Oracle/Google are working to “limit ODF interop” because “they want a total ripout and replace of MS Office.”
  8. The Open Document Foundation is in “the middle area of trying to perfect the conversion to XML”.

Let me take these points one-by-one:

  1. The OpenDocument Foundation seems to try to clothe themselves in the mantle of the open source community and pontificate on how the big bad vendors treat interoperability. But are they speaking as a non-profit or as a vendor? Take their DaVinci plugin, for example. Where is the source code? Why isn’t this open source? Are we to follow the Foundation’s claim of 100% interoperability, based on blind faith, without seeing some proof in the form of working code? I’ve been working on document conversions and document file formats of one kind or another for almost 20 years. I’ve never seen 100% fidelity conversions of anything but trivial formats. Extraordinary claims require extraordinary evidence. But we have nothing here, just white papers.
  2. I would not claim a priori that all customers require lossless, 100% fidelity conversions. Remember, we do not see 100% fidelity even when upgrading from Office 2003 to Office 2007, but this appears to be adequate. What is required is that the total return from changing document formats exceeds any other profitable use of capital available to the enterprise. In other words, to a business this is an investment, and will be judged as an investment. Very few businesses will take a dogmatic, ideologically pure view of this. Ask yourself, would you accept 1% loss in fidelity if I gave you a billion dollars? Yes,of course you would. There are no purists in business who will remain in business. We’re just haggling over what price/fidelity combination is needed to make a prudent investment.However, there is a notable exception to this rule, and that is where access to open document formats are mandated as a public right, not as a business investment. Think of the last 20 years or so of enabling public buildings with ramps for the disabled, bathrooms to accommodate wheelchairs, braille lettering in elevators. This was done by legislation and regulation, as a matter of public policy, to ensure that all of the public has access to public facilities. There was no requirement that an access ramp post a net profit. Similarly, today we see some movements to ODF are based on open-access principles.
  3. This is what we call the “fallacy of the excluded middle.” You are either with us, or against us, etc. It is false to suggest that the only two approaches to interoperability are to either blindly follow the OpenDocument Foundation’s mysterious DaVinci plugin, or to ignore interoperability altogether and advocate rip and replace. There are today two other other ODF plugins available, one from Microsoft and one from Sun. This is real, running code, open source even in the case of the first plugin. So why should we be taking exclusive direction from the Foundation on how we achieve interoperability? Oh right, they are claiming that their program achieves 100% round-trip fidelity. Extraordinary claims…
  4. Gary is in the ballpark when he suspects that Microsoft is seeking some sort of “grand convergence” around protocols and formats. However, I disagree with his impression that OOXML sits at the center of this. In my opinion, OOXML is a rushed, transitional format, intended purely to disrupt ODF adoption. Just as the Office 2000, Office XP, and Office 2003 markup formats were abandoned by Microsoft, I predict that OOXML will soon be cast aside. The problem is that OOXML is such a poorly-engineered format that not even Microsoft wants to build upon this. If I had to divine the future of Microsoft’s file formats, I’d look more in the XAML/XPS/Silverlight space. I believe that future MS Office document formats will look more like that than like OOXML.
  5. I find this observation amusing. ODF, which started its standards track late in 2002, was not designed to be 100% compatible with Office 2007. Mercy me, how did we manage to drop the ball on this one?! Remember, in 2002 there was no publicly available specification for Microsoft document formats. There was no Open Specification Promise or Covenant Not to Sue. So not only was 100% compatibility technically impossible, attempting it via reverse engineering was precarious from a legal standpoint. In my opinion, it still is, even in 2007.In any case I’m staunchly opposed to evolving any open standard purely for the benefit of a single vendor. Microsoft Internet Explorer is the dominate web browser. Should we then require that HTML only evolve in ways that improve interoperability with Internet Explorer? I don’t think so. Why should document formats be different?
  6. This comment manages to avoid confronting a heap of contrary facts. Microsoft supports the open source ODF Translator project on SourceForge. Sun has made their own ODF Plugin 1.1 for MS Office available for download. And Novell, along with helping the Microsoft effort, has integrated that translator into their version of OpenOffice and has also started work on more powerful, next-generation support for OOXML. So these three companies are seeking to “limit ODF interoperability and usefulness”? If so, they sure have a clever way of disguising their intent. To the ordinary bystander, writing conversion and translation code to allow documents to be shared between OpenOffice and MS Office would be seen as a pro-interoperability statement. But thanks to the OpenDocument Foundation’s in-depth sleuthing, we now know that the opposite is true. Not!Although I have serious doubts as to long-term technical feasibility of some of these translation endeavors, they do have the advantage of showing real, running code working with real, running applications. They may not claim 100% fidelity, but this is first-generation work and will undoubtedly improve. But they have an important advantage over the Foundation’s DaVinci Plugin in that these other efforts demonstrably exist. Given a choice, I’ll take an open source version of a partial fidelity convertor, with a reasonable architecture, over one that claims 100% fidelity, but that I can’t see or touch.
  7. The claim is that IBM/Google/Oracle also want to “limit ODF interop” because (according to Gary) we want rip & replace. Strange, but just a few weeks ago I lead an ODF Interoperability Camp in Barcelona, on behalf of the OASIS ODF Adoption TC, where we had a good selection of ODF vendors, open source projects and customers working to improve interoperability, including Sun, Novell, Google and IBM. The OpenDocument Foundation is a member of the OASIS ODF Adoption TC. So did they help in the organizing of the event? Did they participate? No, nothing, nada. Evidently it is easier to complain about interoperability than to do something about it.And again there is this fallacy of the excluded middle. You must either accept the magical DaVinci Plugin, or you are for rip & replace. There are no other alternatives considered. I’d remind the OpenDocument Foundation that interoperability was not invented yesterday, and that there are many technical approaches that can be applied to foster it. Open standards are one way, but there are others that can be applied as well, including conformance testing, test suites, plug-fests, profiles, shared code, reference implementations, etc. We should apply experience and engineering judgment to select the appropriate solution for the problem, and not fall into the trap of believing that there is only a single path to interoperability, and that this path just happens to be based on the Foundation’s product.
  8. Although it sure would be nice to portray yourself as the little guy, watching out for the customer, while the big bad vendors tromp all over the flowers, the fact is that the big vendors are actively working on interoperability, with at least three major solutions available today, as well a major initiative around interoperability in the ODF Adoption TC. In particular, IBM (with SmartSuite) and Sun (with StarOffice) have 15 or so years experience each in working on document interoperability with MS Office. This isn’t rocket science, but neither is it easy. You can either stand on the sidelines and make pronouncements about how the world is out to prevent interoperability, or you can roll up your sleeves and help get the work done. I know which one I’ll be doing. What about you?

If the Foundation’s approach was technically feasible, they would just go out and do it. You don’t let a breakthrough technical innovation wait on a standards committee to act. You just go out and do it and then standardize it later, once you’ve proven it works. If the Foundation really thinks that they can achieve 100% interoperability with MS Office with just 5 simple changes to ODF, then why the heck don’t they just do it? Don’t wait for the formality of an the ODF TC ‘s approval. They should go ahead, as if the standard already had their 5 fixes, and show the world how they have achieved 100% interoperability with MS Office. If they are right, they would all become multi-millionaires in a very short period of time.

Filed Under: Interoperability, ODF

The biggest media launch of all time?

2007/09/27 By Rob 13 Comments

The news from all directions is that Halo 3 had a big day, with “first day” sales of $170 million, which actually includes advance sales as well. Let’s take the report from the XBox.com web site as the canonical version of the tale:

Microsoft today announced that Halo® 3 has officially become the biggest entertainment launch in history, garnering an estimated $170 million in sales in the United States alone in the first 24 hours. The Xbox 360™ title beat previous records set by blockbuster theatrical releases like Spider-Man 3 and novels such as Harry Potter and the Deathly Hallows.

I’m not sure who determines whether this is true or not “officially,” but before the boys at Guinness update their book, let’s examine.

Halo 3 is a video game. Spiderman is a film. Harry Potter is a book. These have very different sales models, so it is odd to compare them and declare one of them as “biggest entertainment launch in history”. But if you want to compare different media, then by what objective criterion can you exclude television? Certainly, TV is entertainment, right? Although the sales revenue in broadcast television comes from advertisers, not from the viewers, these are booked as sales nonetheless.

So, let’s take the Super Bowl, television’s annual blockbuster. In 2007, estimates are that CBS took in $162.5 million for in-game advertisements, a further $78.1 million in pre-game and post-game show advertisements. Local network affiliates took in an additional $42.2 million in local spots. This gives a total for Super Bowl XLI advertsing sales of $233.8. Also we need to factor in ticket sales. At $600/ticket (for legitimate tickets — let’s ignore the inflated secondary market) and with Dolphin Stadium having a capacity of 76,600, this comes out to an additional $46 million. So the total of tickets plus advertising for this one-day media event was $279.8 million, or 65% more than Halo 3’s first-day sales. Sorry, Master Chief.

So the claim that Halo 3 has “officially become the biggest entertainment launch in history” is unsubstantiated, in my opinion. The sales of Halo 3 are undoubtedly strong, but let’s drop the hype and give the gridiron its due.

Filed Under: Microsoft, Uncategorized

OpenOffice.org Conference 2007

2007/09/24 By Rob 6 Comments

I’m back from Barcelona despite Delta’s best efforts to trap me at JFK airport. No rain, no snow, no sleet, no security alert, no strike. Nothing. But somehow Delta managed to turn a scheduled 40 minute flight to Boston into a 3 hour delay to board plus another 2.5 hours sitting on the runway waiting to take off. So instead of arriving at 18:00, we didn’t arrive in Boston until 23:30.

It is interesting to look at FlightStats.com to see how they rate this particular flight. It says that DL 480 has an on-time percentage of 30%, and is excessively late 52% of the time. The average delay for this flight is 79 minutes.

I just don’t get it. It is one thing to be slow. But why can’t you be slow and still be accurate in your estimates? If you are going to be 79 minutes late on average, then why don’t you adjust your schedules accordingly?

In any case, the conference in Barcelona was great! This was my 2nd year attending OOoCon. Last year, in Lyon, I attended OOoCon as an outsider. I remember then being asked by several attendees why IBM was not contributing code to the community and thinking to myself how much it sucked that we were not doing so. What a difference a year makes! Now the discussion is not if IBM will contribute, but the logistics of exactly when and how we will make our contributions. I was proud to attend the Barcelona conference as a real OpenOffice.org member, and I can tell you that the beer tastes better when you are a member of the community.

I gave a presentation called “ODF Interoperability: The Price of Success” on Wednesday. The slides should be posted up here within a few days. A video of the presentation is here. Your best bet is to wait for the slides and follow along with my audio.

On Thursday I lead a full-day workshop on ODF interoperability on behalf of the OASIS ODF Adoption TC. We had participants from a number of ODF vendors/projects: IBM, Sun, Google, Novell, SEPT-Solutions, Haansoft, OpenOffice.org and KOffice. We worked through a few exercises where we tested the exchange of documents that reflected a number of typical real-world business cases. Although they did not attend, we also did some tests with the Clever Age Word Add-in. This event was the first of hopefully several workshops where we will attempt to bring the vendors together in a focused effort to improve ODF interoperability.

There were many good conference sessions that I wanted to attend but missed. That is the downside of having a full day workshop. Of the sessions I did see, the highlights were:

  • Louis Suarez-Potts’s opening keynote “OpenOffice.org 3.0 and Beyond”
  • Hu Cai Yong’s impassioned “Beyond Technology, the Chinese Roadmap” on the subtext of Western cultural imperialism embedded in some “one size fits all” commercial software application suites.
  • Barbara Held’s talk “Toward openness and accessibility” (video available here)

For the ones I missed, I need to go back and watch the taped sessions and read the presentations.

Overall, it was great to see old friends, and meet so many more for the first time, including some with whom I have corresponded with at length, but never before had met in person.

I didn’t have much time to play a tourist, so I’ll give you only two pictures. The first I’ve taken from the Ars Aperta website, a picture of Charles Schulz and I exchanging funny stories at the Mac Porting party:

And in the “Maybe My Youth Was Not Misspent” Department comes this picture of a decorative “column” outside the building where I gave my presentation on Wednesday. The building hosts the University of Barcelona’s philology department. I immediately recognized the text as Homer and snapped this photo. The next day I was passing when two students were trying to read it. I stopped, and stood, with arms dramatically outstretched, and in my best Greek dactylic hexameter, recited from memory the Invocation to the Muse that begins the Iliad. So, thank you Professor Higbie, wherever you are, for making us memorize Homer. It actually came in use!

Filed Under: ODF Tagged With: OpenOffice

Office 2007’s Confusion Mode

2007/09/09 By Rob 24 Comments

Although Microsoft publicly testifies from every available pulpit of their deep longing for multiple document formats, a quick glance at reality shows that this love remains unrequited in their products. For example, what new formats does Office 2007 include out of the box? A new Microsoft XML format (OOXML), an updated Microsoft binary format, and a different new Microsoft binary format for Excel. So Microsoft clearly loves multiple Microsoft document formats! (Discuss among yourselves whether this love is amour de soi or amour propre.) But what about other, standard formats? ODF support is available only as a separate download, in their ODF Add-in for Word. However this tool is very poorly integrated into the Office user interface, making it almost impossible to use for real work.

For document exchange between different versions of MS Office, on the surface it looks a little bit better. Office 2007 provides a “compatibility mode” for users of Office 2007 who wish to create or edit documents that will remain compatible with earlier versions of Office.

That’s the theory at least.

In practice, things are rather messy. I recently received an email from Julie Watson, a project manager who has been doing enterprise deployments & migrations for 15 years. She has spent the last few months working on a plan to migrate 18,000+ workstations, trying to find a way to have a gradual rollout while still maintaining round-trip collaboration between her Office 2003 and Office 2007 users. Julie has put together a nice report showing what works and what doesn’t. Ignore the official documentation and ignore intuition, since neither will serve you well here. Take a gawk at the seedy side of reality in “[Compatibility Mode] Confusion in Office 2007.”

Filed Under: OOXML

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 43
  • Page 44
  • Page 45
  • Page 46
  • Page 47
  • Interim pages omitted …
  • Page 69
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies