The Ecma OOXML web site has been updated. The version of the OOXML specification which was submitted to JTC1 is not longer there. Instead we have a new version, generated on February 1st. I have no idea if the content of the new version differs in any substantial way from the older version, but it is clear that the pagination is different. So page number citations, as referenced in this blog and other places (such as the Groklaw analysis) are now incorrect.
(Why don’t I cite using section numbers? Good question. This is because the version of OOXML submitted to JTC1 reused section numbers, so a reference to “section 3.4.2” could be ambiguous.)
A more significant change is that the annexes are now zipped up with the PDF file to which it pertains. So Part 4 is now a zip file with 4 electronic annexes enclosed. This is different than what JTC1 received. I don’t believe that JTC1 NB’s received the electronic annexes at all.
Maybe when Ecma finishes deciding how they want to paginate the thing I’ll go back and update page references in previous posts. But for now, I’ll leave them as-is, which will match the version that JTC1 NB’s have received, but not the version that the public has.
I’ll close by saying that this is a bit odd for open standard, that the version that was submitted to JTC1 is not the same as what is available to the public. You would almost think that someone out there did not want public input on OOXML to be easily consumable by JTC1 NB’s.
Anonymous says
Perhaps they’re trying to fix it? I mean, there should be *plenty* of things in there to fix, although that still seems really odd, because the old version should still be somewhere, no?
Steven G. Johnson says
The original version of Ecma-376 is now mirrored. (This is legal because the Ecma web pages explicitly state that their standards documents are not copyrighted.)
hAl says
I would be nice if anyone could actualy trace an axample of a difference other than repagination.
Should be a piece of cake using MS office 2007 ???
Mayby this version is already prepared for Ecma’s answer to the contradictions elaborating information and mayby changing some error findings. I remember ODF having a 1.0 version and then a 1.0 second edition version with ISO comments in them. So this might well be a simular second edition version.
Weird to put such a version on line already though. Could it be that they already send it to JTC1 in advance ?
Rob says
hAl, The version submitted to JTC1 was in PDF format, not OOXML. So I would need a PDF-based Diff program to see if there were any other differences.
It is certainly legitimate to correct errors in specifications. But they should maintain dated and numbered versions, and keep the previous versions available. Sure we have ODF 1.0 second edition, and ODF 1.1, but the original ODF 1.0 was never taken down.
I’m not an expert in Ecma process, but I would be surprised if they could make any substantive changes to the specification without another vote in Ecma.
In the end this is probably just a reformat/repackaging of their specification. However, if it is to be considered the same standard (OOXML 1.0) then they should not be changing page numbers, since that breaks any citations. If they need to do that, then they should call it OOXML 1.01 or something like that.
Anonymous says
I made a diff with xpdf:s pdftotext and cleaned it up a bit.
http://download.cl1p.net/ECMA-376/
Looks like some text have become garbled in the new version.
Hasan says
rob, you mean i’m stuck with the 15 kg, 2 boxes, 6039 pages with wrong page numbers and changed contents?!!!
Rob says
Hasan,
Hopefully you can recycle the paper.
From what I’m hearing there should be no substantive changes in the text of the new specification. It is merely breaking it into multiple PDF files.
But this does show some lax controls when versions of OOXML come and go without a change log and preservation of previous versions. It would be one thing if this were a draft. Drafts can be taken down and replaced at any time. But this was the final Ecma-approved version of OOXML, the version submitted to JTC1. This should not be silently replaced. In fact it should not taken down at all.
orcmid says
Rob,
There is no change in the tables of contents or in the page numbers printed on the pages themselves.
Between the October 2006 Final Draft that ECMA approved, the only additions seem to be addition of additional cover pages and modification of the title pages of the parts. Some blank sides also show up that were suppressed before (but included in the front-matter page numberings).
Hasan can keep his printed copy. The body pages (arabic page numbers) and the tables of contents are all accurate for the individual parts.
If you use an Acrobat Reader, and rely on the page-count positioning that the reader uses, you will see differences. But the document pagination and TOC for body pages is unchanged.
I have no idea why a single PDF was submitted-to?/distributed-by? ISO. It loses a great deal of convenience, including the additional schema files and such. The single PDF at Steven Johnson’s site has none of the hyperlinking in the TOC and cross-references that the ECMA editions have.
Rob says
Hi Orchmid,
When you are dealing with a giant 6,039 page PDF file, the page numbers printed on the page are irrelevant. They have no significance at all since navigation by those page numbers is not practical. In fact, these page numbers are ambiguous in what was submitted to JTC1, since there are 5 page 1’s, page page 2’s, etc. Ditto for section numbers.
Since they simply appended these separate documents together without a consolidated table of contents or consistent page or section numbering, the only practical form of navigation is via the PDF page numbers.
Your point about the numbers on the page may true (although I’ve received other reports of some page numbers changing), but irrelevant. In a 6,039 page document, no one will navigate via those page numbers.
In any case, JTC1 must review what was submitted to it. I wish it had been submitted the cleaner, 5 part version. But wishing doesn’t make it so.
Anonymous says
From Captain Europe,
Yes ! I have discovered that in the beginning of last week. So, I have asked a question Wed 7 feb 2007, on this web site :
http://www.formats-ouverts.org/blog/2007/01/30/1088-guerre-des-formats-quelle-est-la-position-de-la-france#c3303
It was a big job to analyze ECMA-376, it’s sure. But now, we have to compare the old version and the new version ! It’s crazy !
I suppose that it’s a strategy of ECMA.
Best regards.
Captain Europe.
Anonymous says
Did anybody else notice that the DOCX files at the ECMA web site are zipped? Don’t they know that DOCX files already are zipped? Their own spec–the one they just zipped–explicitly says so!
orcmid says
@anonymous: Some of the Zip packages containing .docx parts also contain other material (as do the Zip packages containing .pdf files).
I’m actually grateful that they used zip packaging in both cases, so there is no temptation of my browser to open them or be confused by mime types that may not have been set up properly on the ECMA server.
One curiosity about the largest .docx, that for part 4, is that it is around 1,000 bytes smaller when deflated in the “outer” Zip wrapper. Even with the extra content, the final Zip is slightly smaller than the .docx by itself (and not re-Zipped).
orcmid says
@rob: Concerning your justification of reliance on the PDF page numbering, rather than the printed numbers (which agree with the printed tables-of-content), I can’t help you there.
I’ve learned never to use those numbers, and I always turn in comments (on ODF and on OOX) with identification of document version, part, and section as well as printed page number. That’s such a habit that it didn’t occur to me that people do otherwise when reviewing specifications of this kind.
It is unfortunate that the 5-part official ECMA-376 wasn’t used, considering how much hyperlinking and clickable cross-references are provided in those versions. That’s really helpful. Why the documents were stitched together into a single PDF for the JTC1 conflict review is a mystery.
I encourage people to revert to the official versions and using printed section+page numberings in any continuing analysis and commentary.
Translation of the PDF number blocks to the corresponding 5-part printed numberings is pretty straightforward, especially since it is the large Part 4 that seems to have the most commentary.