Google has a nice feature that allows you to search for documents that match a given file type. This is done by adding “filetype:NNN” to your query, where NNN corresponds to the file type. This feature has supported the ODF and OOXML document formats for at least two months, when I first noticed it. I’ve been tracking some numbers since then and now have enough data to make some observations.
At last count the totals were:
| Format | Count |
|---|---|
| ODT | 85,200 |
| ODS | 20,700 |
| ODP | 43,400 |
| Total ODF | 149,300 |
| DOCX | 471 |
| XLSX | 63 |
| PPTX | 69 |
| Total OOXML | 603 |
As you can see, there is some round-off happening on the upper range. Perhaps at the high-end counts are estimates based on sampling?
In any case, I am rather surprised by the low counts given for OOXML documents, especially considering that this format has been supported since the Office 2007 beta last summer. According to Brian Jones, there have been over 4 million downloads of the OOXML Compatibility Pack for older versions of Office, and that there is a new community of, “over 300 other companies and partners who care deeply about OpenXML”. We’re also told that Office 2007 sales are above expectations, “two times greater than the purchases of Office 2003” according to one research firm. Recently announced third-Quarter results for Microsoft showed “better than expected” results for Office 2007 sales, $200 million better, according to Microsoft CFO Chris Liddell.
So with all this evident love for Microsoft Office 2007, why is it that 6-months later there are only 63 OOXML spreadsheet documents on the web, something like 0.3% of the number of ODF spreadsheet documents? How can there be 300 companies supporting OOXML and only have 69 OOXML presentations on the web? (This is starting to sound like when I say I support 30 minutes of aerobic exercise a day. I don’t do it, but I sure support it!)
OK, I know the argument about “dark matter”, that Google indexes only the tip of the iceberg, that there is a lot of data squirreled away on PC hard-drives, behind corporate fire walls, etc., stuff that Google will never see. But the same is equally true for ODF documents, right? I have tons of ODF documents on my laptop, but none of them are indexed by Google.
Of course ODF has been around for a year longer than OOXML. That’s an important fact to acknowledge. We can put that in perspective by plotting the graph of ODF and OOXML document counts against the number of days since adoption of these two standards. So ODF counts are based on a start of 1 May 2005 and OOXML starting in 7 December 2006, when OASIS and Ecma respectively approved them. You get this:

As you can see, ODF has a nice upward trend. OOXML is also trending upwards, though it is somewhat lost at this scale. If you do the analysis it comes out to around 300 new ODF documents per day versus 6 for OOXML. So, two years later, ODF adoption, in terms of documents per day, is 50-times greater than OOXML is, at a time which should be OOXML’s high-growth period, considering all the great news that is coming out of Redmond.
So I’m a somewhat at a loss to appreciate the significance of Novell or Corel adding OOXML support to their editors. With only 63 OOXML spreadsheets out there, wouldn’t it be cheaper just to hire someone to retype the documents in the destination application? The average user is more likely to find a Buffalo Nickel in their lunch change than to find an OOXML document outside of captivity.