The OASIS ODF Technical Committee voted a couple of weeks ago to create a new subcommittee, on “Advanced Document Collaboration”. Robin LaFontaine, from DeltaXML will chair the subcommittee.
Since the entire ODF TC is quite large now (almost 20 active members attend each meeting) it is impossible to do a technical “deep dive” on every topic in our meetings. So when a particular specification domain requires sustained attention for a period of time, we can create a subcommittee, to allow interested TC members to study and draft specification enhancements. We’ve done this several times before. For example, the Accessibility SC developed the accessibility enhancements for ODF 1.1. And the Formula and Metadata subcommittees drafted those key parts of ODF 1.2. I hope that this new SC will be equally successful in their work.
So what is “Advanced Document Collaboration”? A key part of this will be enhancing change tracking in ODF. I’ve been looking at how existing applications implement change tracking and I’m not 100% satisfied. And I don’t mean only ODF editors. Even Microsoft Office using OOXML lacks full and complete change tracking support. For example. Microsoft Word does not track changes that occur in an OLE object. And change tracking in PowerPoint is entirely absent. And starting in ODF 1.2 we have an additional RDF metadata layer in documents and we need to consider how change tracking deals with this. So there is a good opportunity here for us to advance the state of the art.
We are fortunate that earlier this year the OpenDoc Society, with sponsorship from NLnet Foundation. commissioned a proposal of a feature-complete change tracking specification from DeltaXML. This draft has also been contributed to the ODF TC and has attracted some implementor interest, with prototyping work occurring both in KOffice and AbiWord.
While studying change tracking, I’m hoping the SC will be able to give some thought to how we might canonically represent an “editing change” artifact. By this I mean a high level change which in the general case might be a correlated set of content, style and metadata changes which appears atomic the user, but which at the implementation level might touch several XML files in the ODF document. This editing change artifact, aside from being necessary to represent change tracking, could also be quite useful in other problems, such as a runtime clipboard format, as a quantum of change in a real-time collaborative editor, or to represent the persistent form of a document selection, which itself is useful in contexts such as fine-grained digital signatures. Not all of this happens overnight of course But I’m hoping that the initial work on feature-complete change-tracking will give other benefits down the road.
The charter for the new Subcommittee follows. If you are interested in these topics but are not already a member of OASIS, then I’d encourage you to join now, so you can “get in on the ground floor” with these exciting new discussions.
Statement of Purpose
Many ODF documents do not involve collaboration. They are created by a single user, edited by a single user, and then perhaps presented or shared with multiple users, or maybe even just converted to PDF for distribution.
However, collaboration-based document scenarios are also common, including review and comment, change tracking as well as emerging work in real-time collaborative editing, in-context document collaboration, persistence of structured document fragments, and so on.
In order to bring together technical experts in these areas, and for them to evaluate trends, investigate opportunities and draft enhancements to ODF in these areas, we are proposing a dedicated subcommittee for this topic.
The initial and highest priority for the Subcommittee will be change tracking. Reliable and user-friendly revision management is critical for professional document workflows in corporate and public sector environments, and as such an important feature of Open Document Format.
The SC is asked to prepare a draft specification of a markup vocabulary that can accurately describe any incremental change to the content and structure of documents – typically made in multiple editing sessions by different authors.
Deliverables
- A draft specification for change tracking, including Relax NG schema
- A description on how to apply change tracking markup to the various
versions of the OpenDocument Format (ODF) as a host format.- A set of test documents that will allow implementers to validate their
change tracking implementations.- A document that describes in detail how the existing change tracking
mechanism in ODF can be converted to the new markup.- Other proposals, draft specifications and in-scope work related to the subcommittee’s Purpose.
Great idea. Have you also looked at the open source word processor Abiword (http://www.abisource.com) and their real-time collaboration capability (http://abicollab.net/). It might not be the perfect solution, but it might be a starting point.
A better idea might be to simply leverage an existing system. XML is just data. Source code if you will. So why not use tools tuned for that job? One day I want to simply embed a .git repository inside the ODF container. That would allow for complete and thorough change tracking. It would also allow for some nice new workflows E.g. Bob e-mails a document to Alice and Sarah. They both change it independently of each other. Bob can merge both changes back into the original document. This even works if neither Alice’s nor Sarah’s ODf application understands git. Only Bob’s application needs to support it so he can commit the changes.
Alas, at this moment most applications still delete all files inside an ODF container that they don’t understand. Even though that’s discouraged in the ODF spec.
@Sander — One approach is to simply store an entire copy of each document revision. Then rely on a runtime diff function to identify and present the changes to the user. This is a simple and complete solution, but not very efficient. A better variation on this approach would be to use VCS techniques to store the various revisions efficiently, along the lines you describe.
The other approach — and what ODF already does — is to store the base version of the document, and then put inline markup that describes the individual changes.
But aren’t these approaches equivalent from the perspective of the information stored? In other words, aren’t they two different representations that could be converted back and forth without losing information? If so, the 3-way merge is possible with either approach. Of course, the efficiency of operations may vary depending on the representation chosen. A key task for the new SC is to agree on what operations we should optimize for.
@jsmith, I have heard of AbiCollab, but I have not given it a try yet.
@Rob: I see two problems with the current approach of inline change tracking that VCS tools would solve.
The first is that tracking changes will only work on whatever elements and use cases the TC is able to dream up and account for. A git repository (or something similar) tracks everything. XML files, images, OLE objects, binary blobs, possible future extensions, etcetera.
Imagine an ODT document with an embedded XFrom. Now imagine that the XML file that supplies the XForm with data is also stuffed inside the ODF container (it’s a logical place for it). Alternatively, think of OLE objects, or the “sheet music notation” that KOffice has. A simple textual or binary diff on everything inside the ODF container would catch changes in all of these.
A second problem comes from the three-way merge. You’d end up with a non-linear editing history. Can inline change elements represent that?
Anyway, my main interest in doing it with git is leveraging all the existing tools and workflows that are currently used for source code. Especially distributed document authoring. I imagine it would be quite useful even for the ODF TCs themselves if you can work of a single document simultaneously with multiple authors and not loose any changes in the process.
Regarding version control systems: I guess it should work reasonably well to track a linear sequence of changes using any version control system (e.g., git or Mercurial). It would help a with an option to store the odf document as an unpacked directory with uncompressed files, rather than a zip file.
The interesting problems, which a general version control system by itself won’t do:
1. Displaying differences between revisions in a nice way (I imagine this would be mostly an implementation question, not a spec question, but there may sure be spec issues on how to make diffs easier to work with. I’m afraid I’m not familiar with xml diff tools).
2. Merging changes from different branches (say, two users commit mostly independent changes to the base version of a document, now try to create a document with both users’ changes, with a minimal amount of manual work.
“Change tracking” in office programs, as far as I’m aware, doesn’t address merging at all.
The problem with merging divergent history (e.g. two people independently working on a document) is good XML-aware merge. Which as far as I know doesn’t have good solution.
@Jakub: I believe that the guys at DeltaXML have a three-way XML merge tool. I think that the most demanding aspect is that it’s not just merging XML but ODF-XML. Changes to an XML tree do not necessarily change the ODF document. For example, the order of style definitions. Or some of the internally generated XML IDs used to link styles to elements.
@Niels: I don’t know for sure (Rob can answer this) but I don’t think this TC will say anything about how changes are presented to the user. That’s left to the application developers. It’s just about how to track and store changes (and hopefully about merging too).
@Sander, we’re not going to get into the UI for displaying tracked changes. But I think we do need to consider the user’s view (their mental model) of changes and have a way for applications to represent this view.
An example: I have change tracking enabled and I copy and paste in content from another document. The pasted selection contains text as well as a variety of text styles, and maybe some associated RDF metadata. The user will expect that this paste operation will generate a single change tracking “event” that can accepted or rejected as a unit. It would be perverse to paste in a complex text selection and have that turn into 20 different atomic change tracking operations.
However, the application may need to know and understand the 20 different things that comprise the paste operation. So I think this suggests some sort of grouping operation via containment, via a change tracking operation ID, or something. Maybe a single level is enough, or maybe we need to support a nested hierarchy of operations: session, bulk operation, atomic operation? That way an application could allow a user to accept or view at any level of the hierarchy. I don’t know the answer here. This is something we need to discuss.
And you have an excellent point about how not all markup changes are necessarily end-user changes. Another good one is the editing metadata, like last modified date or number of words in the document.
Rob,
congratulations on behalf of OpenDoc Society for the establisment of this new subcommittee. We think that this is a very important initiative of the ODF TC. I personally can’t wait for track changes to be feature complete, and be available in spreadsheets and presentations. ODF is really raising the quality bar here.
@jsmith: AbiCollab solves another problem, although I agree it is excellent work. AbiWord is actually actively developing change tracking for ODF and will anytime release its first code of the development branch with change tracking support as proposed by DeltaXML in July; its developer Ben Martin attended the last plugfest in Brussels all the way from Australia. He is also part of the new SC.
@Rob: Well, that’s a whole idea behind commit (or revision) in version control systems: a unit encompassing atomically all changes (usually with description of changes, includes author and date of creating commit/revision).
But it has nothing to do with showing changes (or storing them either as some kind of deltas, or just using embedded version control system).
I’m interested in how such features could be incorporated into a management system; the big failure of change -tracking at the moment is that it is associated with the editing application rather than with the more appropriate (but so often lacking) management system. Emailing docs around is not a particularly efficient method of collaboration and storing docs in files systems is a rather bad way of managing them.
So it seems important to me that the work on the ODF specification take into account the need for and likely adoption of a document management system that will manage the version control, the comments and notes and the items changed in some way that is or can be independent of the application used for editing.
Thanks for the efforts in track changes. This is indeed a much needed addition/improvement to ODF. Some information on the short-comings of change tracking in OpenOffice.org and some comparisons with other office suites, please see here: http://wiki.services.openoffice.org/wiki/Track_changes
Thank you very much for the efforts being made for improved change tracking!