Today I look at Gwennel Doc, an ODF-based text editor for Microsoft Windows. In interesting attribute of Gwennel is its small size and fast speed. It can load and display the 792 page ODF 1.2, Part 1 specification in around 2 seconds, using an executable that is around 1/4 the size of that document. Something interesting is going on here that needs investigation. I contacted the author of Gwennel Doc, Marc Kerbiquet, who consented to the following email interview. Enjoy!
Could you tell us a little bit about yourself, where you live and what you do for work? Are you a professional programmer? Or a hobbyist?
I live in France, I work as a professional programmer to pay the bills and I write programs as a hobbyist. Gwennel Doc is a hobby program.
What got you interested in writing a text editor? How did you pick the name “Gwennel”?
I wrote first a folding text editors for programmers (Code Browser), then I wrote an ODF viewer (Woodrat Reader), so an ODF editor was a natural continuation to this :-)
The initial goal was to make a folding/outlining editor like Code Browser with rich text. But it would have required an hybrid format to handle folding directives.
“Gwennel” means “swallow” in the Breton language, a small and fast bird. Breton is a language spoken in Brittany, a region in the north-west of France.
You call your tool a “WYSIWYM” (What You See is What you Mean) editor. How is this different than other editors, and how does Gwennel Doc support this style of editing?
It is different from all the lightweight rich editors that edit RTF documents or equivalent formats because it supports styles. Styles allow to separate the presentation and content: you can tag a word as “menu item” or “keyword” instead of “Bold” or “Color-Red” and change later how it should be displayed.
Word and OpenOffice allow WYSIWYM but they promote the WYSIWYG (What You See is What You Get) paradigm.
As your website describes, your intent was to make a text editor, not a full word processor. How do you define the boundary between these two? What features did you decide to omit?
The goal of a word processor is to produce a printed document. The paper is an important aspect:
- the page format
- the header and footer
- how paragraphs and tables must be splitted when the end of page is reached
- footnotes
- the table of content
- the index
Gwennel Doc is more a note taking software intended to on-screen reading, so it does not have to deal with all these features.
Printing command is not implemented yet but it will be very basic.
What made you choose ODF as a document format?
- I already worked with ODF before (Woodrat Reader)
- I didn’t want to create a new format.
- As far as I know, there is no other open standard designed for edition and supporting styles:
- RTF: no support of styles
- HTML + CSS: not intended for edition
- OOXML: just a political standard, too complicated anyway
- Interoperability, even if limited:
- it can read partially documents from other word processors (unsupported elements are just ignored)
- Use OpenOffice for all missing features (print, export as PDF, …)
- Gwennel Doc documents can be read without Gwennel Doc
How hard was it to support ODF in Gwennel? What was the hardest part?
Gwennel was designed from start to work with ODF, so the application model, apart from the table styles, fits very well with ODF. Easy to load, easy to save.
A difficult part was to understand the ODF specification, but I’ve already done it when writing Woodrat Reader.
The hardest part was to find a solution to implement table styles in Gwennel as the ODF has no support for table styles. I’ve finally found a solution to keep a compatibility with ODF and to keep a minimum of interoperability. Unfortunately table styles are lost when a Gwennel document is edited with OpenOffice.
One thing that strikes the user is how small and fast Gwennel is. It is less than 150KB in size and requires no install. Compared to other word processors, this is amazingly fast. How did you accomplish this? Can give some details on your approach, such as what programming language you used, what ZIP and XML libraries you used, etc. What is the secret to making a small, fast editor?
I really take care a lot on speed. Almost everything should be instantaneous on a computer that can execute billions of instructions per seconds.
Now, here is the secret :-)
First, the executable is 270K big, not 150K
I cheated, I’ve used UPX to compress the executable.
For curious people, here is the detail of what’s inside:
- 50K – the zip library (zlib)
- 20K – the XML parser (AsmXML)
- 40K – core library (memory management, strings, lists, GUI layer)
- 70K – the rich edit component
- 24K – the (partial) ODT schema
- 64K – the main application code and resources (text, menu, icons)
The operating system (Windows XP and better) provides all the remaining stuff (GDI for font and image rendering, GDI+ for image manipulation, …).
There is even unused code that I could remove to save few kilobytes.
On the other side, I plan to add a lot of pictures to better show the role of properties in styles, it should increase the size of the executable by 100K.
The program is written entirely in assembly (except for the zlib library)
I would be too long to explain why, but I didn’t choose assembly for speed (it could seem crazy as any programmer would say that the only reason to use assembly is for speed), Woodrat Reader is a bit faster than Gwennel Doc to load a document and it is written in C++.
The most visible benefit of assembly is the size of the application, not the speed.
I use common optimization techniques:
- choosing the right algorithm and data model in the critical parts (e.g. a hash table instead of a simple list)
- optimizing access to memory
- caching data (to save computation)
I don’t think that Gwennel is fast; It’s the other word processors that are slow
Some reasons are common to software bloat found in most software:
- long history of development
- marketing considerations (spend more time to develop new features rather than optimizing)
Other reasons are more specific to word processing as word processors support a lot more features than Gwennel, for instance:
- font kerning makes the computation of the layout more complicated,
- computing the paging in realtime requires a lot of CPU (Gwennel uses just one infinite-length page)
Gwennel is written for modern machines with modern OS, the font and image rendering is entirely done by the operating system, so it can take advantage of hardware acceleration. But on the other side, it is limited to the capabilities of the system library and some features cannot be implemented (e.g. no control on spacing between characters or no outline effect).
Loading is fast but it could be faster: The time to load the file, unzip it, parse the XML and build the model is almost immediate even with big documents (1000 pages), most of the time is spent to layout the text by asking Windows the width and the height of each word. Windows is very good for this but it could be optimized: for instance it shouldn’t be necessary to make a system call for each “the” word in Times New Roman, 12pt of the document because the result will be always the same.
Do you have any future plans for Gwennel?
The future plan for Gwennel Doc is to make it a ‘finished’ application:
- a Print command,
- a Find command,
- and minor goodies one can expect such as opening recent documents.
There is no plan to support more elements of the ODF but the compliance has yet to be improved (online ODF validators are not very happy with documents created by Gwennel).
orcmid says
Hmm, looks like we should take a look into table styles for ODF-next.
I definitely like the way this fellow thinks about rolling code. Looks like it should be backup into C/C++ (very carefully) for cross-platform purposes though.
orcmid says
Oh, freeware but not open source. Ah well.
I like the separate styles that can be used, so to have it look like default OO.o pages or Wikimedia style instead.
Rob says
I forgot to mention, there is no installation needed. It is a single file. No registry gunk or anything like that. One file. Stick it on your USB key.
This is an art I think we’ve pretty much lost in desktop software applications. It wasn’t too long ago (say 1990) when every KB counted. If your app grew too much, then your package spilled over from 1 floppy disc to 2 discs and this was money, because your manufacturing expenses just went up. And you had 640 KB memory limitations. Oh, don’t get me started…
Jakub Narębski says
So Gwennel Doc is to ODF like LyX is to LaTeX, isn’t it? Both are WYSIWYM editors.
twitter says
It is very nice that ODF can be implemented so easily and WYSIWYM is nice, but I’m not terribly impressed by Windows free beer. An application that depends on Windows inherits all the bloat and restrictions of Windows, even if W95 API calls are made from assembly language routines by a talented programmer with a penchant for self torture. Vista/Windows7 is even slower, more bloated and more restricted than XP and no one runs 98, so the phrase “Windows XP or better” is both useless and misleading. If you want tiny, get a copy of DSL. AbiWord is four or five times the size of this program but it ports to everything gnu/linux and BSD run on, from cell phones to super computers.
Rob says
AbiWord on Windows is an 8MB download. That is a size factor of 50, not 5. And the install decompresses to 22 MB on disk.
Of course, this is a lot smaller than OpenOffice, but it is also a lot larger than Gwennel.
twitter says
The package size for Debian Squeze is between 1.4 and 2.4 MB, which I mistook for expanded size via apt-cache search, but the bloat to worry about is Windows itself. What’s the point of assembly code optimizations when the target OS takes up 20 to 30 GB and is filled with all sorts of performance robbing DRM checks, encrypted bus signaling, AV checks, file indexing and other user hostile madness? I’m glad the programmer had fun with ODF and assembly but let’s not get so carried away that we forget the entire performance and size picture.
Alan Horkan says
It was possible to cram Abiword onto a floppy disk in 2002. A the time abiword was being built for Windows using Microsoft Visual Studio and could be compressed with UPX to small enough to fit on a single floppy disk (if I recall correctly I did actually dust off an actual 3.5 inch floppy disk). That the build process generated a single standalone “portable” binary and only a few other files made the process relatively easy.
This did not inlcude the dictionary hash files needed for spellchecking so the experiment did not go beyond testing that it was possible and playing around a bit. I did not need to look any deeper or try to find further features that I might sacrifice to save space (language string files are always an easy target, either for removal or compression).
The Abiword build process for Windows was later changed to use MingWin, a free and open toolchain was more practical and convenient for the developers. Instead of a single executable this resulted in a slightly smaller binary but also a handful of larger library files, taking up more space overall. There was no longer a single binary EXE that could obviously be compressed with UPX. If someone really wanted they could probably revive the Microsoft Visual Studio based build process, or possibly modify the MingWin based build process to generate a single binary that could more easily be compressed using UPX. There are so many possibilities and just wanted to let your readers know that if a developer was interested in making a much smaller more portable version of Abiword it is probably an easier task than you’d expect.
For most people Abiword Portable gets the job done
http://portableapps.com/apps/office/abiword_portable