Some pertinent quotes from Microsoft’s Brian Jones, thematic quotes made over a sustained period of time:
- “The Open XML formats were designed to be 100% backward compatible with the existing set of Office binary formats, and that was really a goal that we can’t compromise on.”
- “It needs to be 100% full fidelity”
- “[F]rom our point of view, in order to use an XML format as the *default* format for Office it needs to be 100% compatible”
- “We need to make sure that the format is documented 100% and there are no barrier to interoperability”
- “This format is 100% compatible with the existing base of Microsoft Office documents, so nobody will need to worry about losing features”
Get the idea?
Now these quotes were all made before OOXML was completed. I understand engineering and deadlines and such, and that things don’t always all get done as planned. But I would like to know, now that we have 1.5 OOXML “final draft”, and Office 2007 has released to shipping, is it indeed indeed indeed 100% backwards compatible.
Two simple questions. I’m hoping Microsoft or Ecma can give a straightforward and unequivocal answer:
1) Is the Office Open XML specification (1.5 “final draft”) 100% compatible with all legacy Microsoft Office documents, meaning that a 3rd party, using solely information in this specification (and publicly available open standards), can create a utility on a non-Windows platform, say Linux, to convert any legacy Office document into OOXML without loss of data, function or appearance?
2) Does the OOXML specification (1.5 “final draft”) document the format sufficiently for someone to create a 100% compatible editor (spreadsheet, word processor, presentation) implementation on a non-Windows platform, say Linux? By 100% compatible I mean that it can load and interpret and display all OOXML documents without loss of data, function or appearance?
I note that everything we’ve heard up to now merely says that OOXML was designed to be 100% compatible. But I’d like to hear whether it in fact succeeded at doing these things. That’s the important question, right? We can talk intent all we want, but the results are what counts.
I believe that the criterion should be whether a 3rd party can create a conversion tool and editor based on the documented format. That fact that Office itself may do a conversion is not proof of anything. They could submit a specification both incomplete and erroneous but still do a good conversion job in Office based on private information. The proof of sufficiency for the specification only comes with independent 3rd party implementations.
These are simple questions. I’m hoping for a simple answer.
orcmid says
I’m not the one to answer this, but I am puzzled by the first question and the use case that you provide.
a. Let’s say I have a Word 97 .doc document file here somewhere (I probably do). That’s in a binary format.
b. I have the ECMA Office Open XML Specification in front of me.
I don’t see how (b) is going to help me convert (a) to Office Open XML, nor that it pretends to be such a thing. I don’t see how that could possibly be the promise about the ECMA OOX specification. That would be like asking if the ODF 1.0 specification is enough to write a converter from a native StarOffice 1.0 format.
c. So, I’m assuming that your question doesn’t mean that. Is the question, perhaps, whether it is assured that there is a representation of my document (a) in (b) that allows full-fidelity preservation and presentation of its static content and format?
d. There’s a related question, not exactly about OOX itself, that one could ask even though (c) is answered in the affirmative. That question is whether Microsoft Office Word 2007, when importing (a), can save it in OOX such that (c) is accomplished. For this question, the OOX need not retain any alternative content that is provided solely to facilitate downgrade back to the original Word 97 format. Whether it does or not, this version of the question is whether (c) is achieved without any need to process or consult such material. (ODF provides for such alternatives and I presume without looking for it that the OPC packaging and compatibility provisions of OOX allow similar arrangements.)
Which case(s) do you have in mind?
orcmid says
My previous commment is only about clarifying the question, since I am in no position to answer them in any of the forms that might be intended.
I thought the second question was pretty straightforward (and also not one I am equipped to answer). Then, after I posted the first question, I realized that it could be sharpened in a couple of ways.
First, I assume the question is about conformant applications and their treatment of conformant Office Open XML documents. I think that’s the correct terminology in the ECMA TC45 Final Draft (sections 2.3-2.4 of Part 1, Fundamentals).
It then occured to me to wonder how the second question could be answered for the current OASIS and ISO Open Document Format 1.0 specification. It strikes me that the answer might have to be “maybe” (because of how conformance is defined for ODF 1.0).
I am not equipped to answer the question about ODF, although I can illustrate some challenges in practice. I opened the .odt document of the current ODF 1.1 working draft and examined its content.xml part. It has the following namespace declarations that are not specified by ODF (although that doesn’t make them impermissible):
xmlns:ooo=”http://openoffice.org/2004/office” xmlns:ooow=”http://openoffice.org/2004/writer” xmlns:oooc=”http://openoffice.org/2004/calc”
In the document, the oow: prefix is used in what appear to be material ways. Nevertheless, conformant ODF processor is permitted to ignore those “foreign” usages in an ODF 1.0-format document.
If there is a difference in the result when the document is opened by a different ODF-supporting application, an user might think the answer to question 2 is “no” insofar as fidelity is not observed. The real question is around the nature of the original document’s conformance, of course, but that doesn’t help the user.
I’d say that even question 2 is not so simple.
Rob says
My question 1 is as simple as it looks. It isn’t a trick question. Given the Ecma specification, can a 3rd party convert the legacy Office documents with 100% fidelity.
Here’s what motivates the question. The primary claimed benefit of OOXML, and the consistent justification for all of its warts and infelicities, is this purported 100% compatibility with legacy documents. So I am very interested to know whether this has actually been achieved and whether it is in practice performable by any software vendor or open source project? Or is the main benefit of OOXML something that Microsoft and only Microsoft can reap?
Rob says
I don’t think we can answer this question by looking at conformance statements. In the end, the market, not conformance statements in a specification, drives the level of compatibility needed. So, if I create a specification that documents certain areas partially, or incorrectly, and then say, “It doesn’t matter, those parts of the standard are optional according to our conformance clause”, and then interoperability clearly suffers.
Since Microsoft has framed the debate as being about 100% compatibility, I ask the question in terms of 100% compatibility. Fair enough?
So question #2 should be considered as can a complete implementation of all of OOXML be created by 3rd parties, and on non-Windows platforms? We can consider that as two different questions, 2a) and 2b) if we want, since they may have different answers.
The fact that some threshold minimally compliant implementation could be done on Linux is not sufficient.
orcmid says
Rob, I still find working directly with legacy formats to be a peculiar requirement.
It seems to me the expressed goal is (c), and that’s all I’ve seen claimed for OOX. By making OOX the default format of Office 2007 (Word-Excel-Powerpoint) they have a specified open format going forward that (I presume) can convey all of the features of the legacy set.
Microsoft is also promising conversion plug-in (and apparently allowing OOX to be the default) going back to the same applications in Office 2000, as I recall. (I’ve only tried the beta version that works with 2003).
I don’t think there is any information about cracking the legacy binary formats in the ECMA Office Open XML specification.
So the preservation, interchange, and development of independent applications is based on OOX as the format, not some older binary formats.
There is a batch converter being promised, so you won’t need to have Office 2007 to “rescue” older formats, but I haven’t been following that. I suspect you might need an Office application to rescue some of the very old formats, as is already pretty much the case, if your goal is to end up with OOX for preservation of the documents.
orcmid says
Rob, I don’t read the OOX definition around conformant documents and applications as being some threshhold minimum (although I do read ODF’s conformance statement that way).
I want to agree with your 100% compatibility observation, although I don’t think it is obvious what that is as a decontextualized absolute. I’m concerned that we’re overlooking something.
I certainly agree about the market. And if there are places where OOX is underspecified, I would think those are bugs, just as I think so about the ODF specification.
Rob says
Don’t get me wrong, your (c) is important, and if I ever get an answer to my original questions, you can be sure I’ll follow up with that question as well.
I think it is important to distinguish the application from the format, and which benefits are claimed (or delivered) by the application and which by the format.
If the 100% compatibility with legacy document formats is purely a function of the MS Office applications, and is not something that ensues from the OOXML specification (combined with other publicly available information) then one must question why OOXML even exists.
After all, if this is purely an application function, then the same benefit to users would come even if OOXML did not exist, or existed only as a private, unpublished document in Redmond, right? Office has provided version-to-version compatibility for many years without providing a format specification.
So if OOXML is not necessary for Microsoft to achieve 100% compatibility and it is not sufficient to allow others to achieve 100% compatibility (or at least no one has come out and said that it does) then why is Microsoft constantly touting that the primary benefit of OOXML is its 100% compatibility with legacy documents? The logic of this escapes me.
But let’s do a mental experiment that will maybe make this easier. We can call this (d). Suppose Microsoft made public their legacy binary format documentation, made it available for everyone to use freely, with a patent covenant, and assume further that this documentation was complete and correct. None of this is true as you know, but let’s suppose it for sake of argument. In this case, does the OOXML 1.5 specification satisfy my initial first question, namely can a 3rd party convert all legacy documents with no loss of data, function or appearance, using the 1.5 OOXML “final draft” specification in conjunction with the fictional binary format specification? And can they do this on Linux?
Remember, there are no legacy OOXML documents out there, other than a few beta files and those are not compatible with the final OOXML. So saying that OOXML is 100% compatible with itself is not as interesting. The big claim that Microsoft is making is that OOXML is superior because it is 100% compatible with the legacy documents. I just want to know if this is true, and whether this is a benefit that they alone enjoy.
I thank you for bringing up the comment about bugs. Any specification has them. That’s why we issue errata. The bugs in ODF are freely discussed on the ODF mailing lists which anyone can read. The errors found by the public and sent to the comments list are also there for anyone to read. None of these statements are true for the Ecma OOXML specification. The public has no idea what problems in that specification have been noted by the public or the TC itself. Pardon me if I am not comforted by this lack of information.
Since we do not have that information, and there are no 3rd party implementations demonstrating compatibility, I think it is important for Microsoft or Ecma to confirm that their chartered goal for OOXML — 100% compatibility with legacy documents — has in fact been met.
orcmid says
I think the context of this:
“After all, if this is purely an application function, then the same benefit to users would come even if OOXML did not exist, or existed only as a private, unpublished document in Redmond, right? Office has provided version-to-version compatibility for many years without providing a format specification.”
is not what the limiting benefit is.
My recollection is that one reason Microsoft claims OOX is necessary is because ODF does not allow preservation of the legacy (and preservation in OOX will).
The other part, having a public, XML-based specification, is certainly a greater benefit than the previous state of affairs.
So the attention of others and the market will tell us whether the preservation of the legacy in OOX is a major, significant realized benefit.
Likewise, whether having the public OOX specification under ECMA stewardship along with the Microsoft covenants and promises is a significant realized benefit is left for the future to determine.
With regard to the concern for having our digital documents locked-up tightly in an application-linked proprietary format, I can see potential great benefit of the OOX approach over what we have had to this point.
Whether this leads to acceptability of OOX-format documents in civil administration and eGovernment preservation of public records, I have no idea.
Anonymous says
What percentage of complainace with legacy document would you settle for ??
99.99% ?
And it is unlikely the OOXML spec will even get that.
I think in any large scale conversion of legacy documents some features will show up that can’t be converted. However is the conversion tool is good those files will be signaled and then they might be dealt with in another way depending on the specific feature.
OOXML should support as much legacy document features as possible whilst still being a usefull format for the future.
If in a conversion the number of files that cannot be converted automatically to the new format is very low than that is acceptable especially if there is an alternative solution like manually altering the file.
Rob says
Some pertinent quotes from Microsoft’s Brian Jones:
“The Open XML formats were designed to be 100% backward compatible with the existing set of Office binary formats, and that was really a goal that we can’t compromise on.”
“It needs to be 100% full fidelity”
“[F]rom our point of view, in order to use an XML format as the *default* format for Office it needs to be 100% compatible”
“We need to make sure that the format is documented 100% and there are no barrier to interoperability”
“This format is 100% compatible with the existing base of Microsoft Office documents, so nobody will need to worry about losing features”
Now these quotes were all made before OOXML was completed. I understand engineering and deadlines and such, and that things don’t always all get done in release one. So I’m not going to attack someone just because they didn’t complete everything on time. I just want to know whether now with the 1.5 OOXML “final draft”, it is indeed indeed 100% backwards compatible. Does anyone know the answer? You and I can speculate, but there are certainly people reading this blog who know for sure.
Microsoft certainly wasn’t shy about claiming it before, (read those quotes) and they didn’t agonize over definitions or plead that the question didn’t make sense. It is a simple question. Has OOXML met their charter and design goals and all those promises of 100% compatibility with legacy Office documents?
Rob says
What % would I settle for, you ask?
Here’s a few ways of looking at it:
First I don’t think legacy documents are really the issue. This is Microsoft’s distraction to talk about “billions” of legacy documents. The fact is that most legacy documents will never be touched again. They are last week’s cafeteria menu, or the memo from two years ago. Sure, you will need to read some of them again, and perhaps even edit some legacy documents. But it isn’t like DOC and XLS and PPT support is being taken out of Office. Office continues to support those legacy binary formats for reading and writing. So if you have a legacy binary document, then open it in Office or OpenOffice, or whatever and do your work. Why convert if you don’t need to?
If someone mails you a document in a legacy format, then what are you going to do? Convert it into OOXML? I don’t think so. You’ll just load it as-is and get on with your work. Why convert it if you don’t need to?
Another scenario would be if you have Office 2007 and a document in OOXML format and need to give it with someone who has an older version of Office. Well, you could first convert your OOXML document into DOC format and mail it. But wouldn’t it be simpler to have your recipient install the OOXML compatibility pack (or whatever it is called) so they can read the OOXML file directly?
In the scenarios involving legacy documents it is the Office application which is 100% compatible with legacy document files. OOXML has little or nothing to do with it.
Is 99.99% sufficient, you ask? The market will decide that. Some documents, as mentioned earlier, will never be converted. Others will be converted once, perhaps touched up manually to fix any conversion problems, and then maintained in the new format. Other documents will be parts of collaborations or workflows that will require them to to be converted between formats repeatedly.
I think the most critical documents in any dual-format scenario, whether ODF/OOXML or DOC/OOXML or DOC/ODF will be the one-time conversion of document-applications, i.e., those documents which have scripts or macros or other active code in them. That plus any document which is being collaborated on by multiple parties using different word processors or even different versions of the same word processor.
On the one time conversion of the document-application anything less than 100% will require your Office scripting guru to get involved, so there is not much difference there between 80% and 99.99%. You still need a help desk call to get your macros fixed.
For the document iterated by multiple users you need much higher fidelity conversions. This is due to the iterated nature of the task. If you exchange a document back and forth 10 times and each time you looses 2%, then the end result is not so good. Or maybe some organizational learning takes place, where users naturally stop using features that get dropped during document conversions, just like I’ve learned to never choose the stapler option on our office printer because I know it always jams.
But with that said, I go back around to my original question. I’m think I’m being very generous here. I’m letting Microsoft frame the debate. The 100% compatibility claim is theirs and they have been playing it up for almost a year a now. I let them pick the issue and provide the answer. Simply put, was the 100% compatibility goal met and is the provided specification sufficient for this to be programmed by third parties and on Linux?
Anonymous says
However by you asking this question it is likely you have already found some feuture that you think can’t be converted just based on the OOXML specs.
Probably to do with scripting within documents.
Rob says
I’m not making up the question to trap anyone. I’m not making up the question at all. This is Microsoft’s point, 100% compatibility, which they’ve been using to promote OOXML and beat up on ODF for over a year. They are the ones who have made this question an important one. I just want to know, now that the OOXML is complete, how well it measures up by that criterion.
Is scripting an issue? That is something I will need to look into. OOXML is a 6,000 page specification. I haven’t read it all.
Anonymous says
If you say:
“The proof of sufficiency for the specification only comes with independent 3rd party implementations.”
than you should not look at Microsft for answers as they are not able to give you an independant proof.
You should ask the guys from OOo mayby ?
C. T. Rambler says
Referring to the much publicized Novell/Microsoft Agreement where Novell says it will work on OOXML for OpenOffice.org, lets see what Novell, **with** Microsoft’s backing and support, can do with converting from legacy binary format to OOXML.
Unfortunately, if the answer finally came, it will be 2 years too late for this discussion. Moreover, I can see if OOXML comes up short in OpenOffice.org on legacy compatibility aim, the OOXML camp claiming that it is not OOXML at fault, but OpenOffice.org coming up short. This will be despite the fact that OpenOffice.org is doing a better job than Microsoft in openning legacy Microsoft format today.
Politics and/or business decisions of private companies is more likely to influence the success of of 100% compatibility of OOXML in OpenOffice.org. It will never be the problem of OOXML, as far as OOXML camp is concerned.
Anonymous says
Also there is room for improvement in future version of OOXML.
Often a number of faults will be found in a spec only during implementation by several different parties.
Just like OpenDocument is now working on two new versions 1.1 and 1.2 it is likely that before the end of 2007 OOXML and any conversion tools will be using OOXML 1.1 and mayby even version 1.2 . It would be very unlikely that a complex and extensive spec is without fault.
Rob says
I’m not asking for proof. I just want an answer my very simple question. Is the “final draft” OOXML sufficient for the interoperability scenarios I outlined? Microsoft should know the answer to this question better than any 3rd party, right?
Rob says
A response sort of, from Brian Jones
Brian first notes the accomplishment of Office 2007’s RTM (no mean feat) and then gets around to my question saying, among other things, “The Open XML standard is fully documented and you can implement it on any platform, so in a way the answer to both of his questions is yes!”
I guess “in a way” that is an answer.
I’ll let Brian take his well-deserved vacation. I need one as well. Then I’ll be back to give some thoughts on why the answer should be a resounding “no”.
For one thing of course the standard is fully documented. That is almost a tautology. Unless the table of contents points to missing pages, every standard is complete as-is. Anything not in the specification is, ipso facto, outside the scope of the standard. The real question, and the way I stated it, is whether the text of the specification is sufficient to achieve 100% interoperability with all legacy Office documents.
This is more than idle curiosity. I’d note for example that Microsoft’s patent covenant only covers things necessary to implement the specification. If there are other things, not in the specification, but still needed to achieve 100% compatibility, then I wonder whether anyone else (other than Microsoft and Novell) can use them?
NK Singh says
l really don’t understand why the 100% compatibility claim is so confusing. I read it as saying OOXML supports all of the features included in the old binary formats. It might implement them in an entirely different way, but the point is that there are no features in the old format which can’t be reproduced in OOXML.
This is meant to be in opposition to ODF, which may not support some Word 2003 features.
Rob says
What about scripting or macros, for example? The legacy binary formats supported them, but I do not see them in the new OOXML specification.
If you want to say that OOXML “supports” scripts because it can contain any arbitrary binary blob in additional to the specified format, then I’d argue that this merely means that MS Office supports scripts, but nothing is disclosed in OOXML that would allow interoperable implementations for scripts.