(This post represents my personal opinion only. The standard disclaimer applies.)
In a previous post I looked at how LibreOffice inflates its user and download stats, claiming to have far more users than it actually has. Several journalists took these claims at face value and repeated them in their articles, never questioning whether LibreOffice representatives were peddling anything other than the plain, honest truth. No one seemed to noticed that the claims did not pass the” sniff test”. No one investigated more deeply. Until now. I hope that after reading these posts that you, gentle reader, will exercise your brain the next time you read a press release or blog post from LibreOffice, and try harder to separate fact from fiction. It will not be easy.
In this post I’m taking a look at another set of claims, those concerning the size of the LibreOffice community. I’ll lay out the plain facts and the analysis and invite contradictions or confirmations. In return I’ll probably get more personal attacks, but that comes with the territory. The LibreOffice marketing lead has already declared me personally to me their number one enemy. I’m sure Microsoft is comforted by this thought.
The Claims
LibreOffice, from the start, has made incredible claims as to the size of its volunteer base. The claims read like something from ancient battle accounts, with men 10-feet tall and armies of 500,000.
Specifically, in a recent blog post, LibreOffice makes the following claims:
- “We are now a family of thousands of contributors around the globe”
- They have “…an even larger number of active volunteers taking care of localizations, quality assurance, community development and marketing at global and local levels”
- And that these additional volunteers are “a community of over 3,000 volunteers from the five continents”
As is common with LibreOffice announcements, these claims are made without definitions, without a stated methodology, without context. So the innocent reader might read terms like “family of contributors” or “active volunteers” or “community” and think these terms are used in their ordinary sense. But they are not, as we shall see.
The Mythical Wiki Army of 3000
The key to finding the 3000 contributors claimed by LibreOffice is to note the fine print of their blog post, where they say of the additional volunteers: “Overall, the number of these people is over 3,000, if we take as a measure those who have contributed to the project wiki.”
Ah, so to the wiki we go now to seek out this mighty army of 3000.
Let’s take a look at their wiki stats then. I’ll give a screen shot in case this page becomes unavailable:
So as you see, they do indeed have 3,510 “registered users”. So their blog post was correct, end of story. They indeed have “a community of over 3,000 volunteers from the five continents”. Right?
Not so quick. There is less here than meets the eye. Far less. Let’s look at some problems with this figure:
First, the sniff test. If you had a community of 3500 wiki contributors, would after 2 years your wiki have only 2160 content pages? Is this the output one would expect from a community that size? Less than one half-page per contributor per year, from this “larger number of active volunteers” ? This disconnect between claims and reality should be enough to warrant a closer examination. This just doesn’t sound credible.
Fortunately the wiki stats allow us to see exactly how many edits each registered user made to the wiki. Curiously, of this “community of over 3,000 volunteers from the five continent”, 1777 of them (over half) have made zero edits. Zero, zip, gar nichts, nada, niente, zilch. This is quite remarkable. A community of contributors where half have made no contributions?! Is that what you commonly think of when you read the phrase “active contributor” in a press release? Evidently, in LibreLand you do.
Further, there are many users with a single edit, accounts like Cashloans121, Fastloans1, Fastloans2, etc. Interesting names, yes? Of course, these are the spam accounts, created so that advertising could be added to User or other pages. It is also common for users to register and to make no other “contribution” than to put their C.V. on their User page. I won’t embarrass the individuals users who have done this, but I see many examples of this on the LibreOffice wiki, where the only “contribution” from a user is self-promotion. In total, 583 of the registered users made only a single edit in the past two years. 449 have made only 2 edits. Sadly, our army of 3000 “active volunteers” is shrinking at a distressing rate.
Spam and other issues are well-known to organizations that use wikis. You don’t claim that your community consists of all registered users of the wiki. To make a claim like that in a press release is deceptive. It is a statistic that has no relation to reality. If the Apache OpenOffice project did exactly what LibreOffice did, and claimed its community size based on the number of registered wiki users, it could claim it had over 75 thousand contributors!
So what is one to do, with such messy data? Certainly claiming 3,000 contributors without any caution about the above concerns is not recommended. Instead, if the phrase “active volunteer” is more than empty syllables you need to apply some reasonable threshold to separate an active member of the community from spam, empty registrations or volunteers that showed up for one day and were never seen again.
One criterion is suggested by MediaWiki itself, in its stats report. It shows that LibreOffice has 112 “active users”, which it defines as users who have made a contribution in the last month. Another technique might be to look at users who have made, say some non-trivial contribution, say 10 page edits in the past two years, in which case you will show 343 active contributors. Another way is to ask how many contributors combined account for 90% of all of the edits. I prefer that metric, and the answer in this case is 342 active contributors. But the only way you can claim “a community of over 3,000 volunteers from the five continents” is to have a disregard for facts, and also a disrespect for your reader. You burn credibility, one of the most important assets an open source project can have.
However you slice it, LibreOffice is overstating the size of their active contributor community by a factor of 10.
(This post represents my personal opinion only. The standard disclaimer applies.)
Rob, I feel the size of the “community” is not what we should be focusing on.
I urge you to spend your energy on making AOO the best Office suite & ODF the most popular format in the world.
Diverting our resources on other things & considering it very important will distract us & monopolist will be comforted by the thought that its rivals can be divided & conquered.
I personally wish AOO & LO become the best Office suite & not just the best *free* Office suite. I do not wan’t both projects to merge because these are too important to become one entity that may then easily be hijacked / ruined by some vested interest. Its like putting all the eggs in one basket.
In addition to my other post here, I want to add that I read (visiting your link) what Italo had to state.
I get the feeling that he didn’t *literally* mean that you were the “main enemy” but if he did then I think I agree with you: “I’m sure Microsoft is comforted by this thought.”
@Paul, certainly a lot to do and I have a lot of energy to go around. I drafted most of these posts while the power was out and I had no internet during hurricane Sandy. Just a passing notice. Back to work on Monday.
-Rob
@Rob: you did an extraordinary amount of analysis of our wiki while you had no internet – I’m impressed.
As for the size of the LibreOffice community – there certainly are lots of metrics – it is interesting that (so far) you don’t appear eager to present any comparative statistics of your preferred project, and invest quite so much energy into attacking our efforts to give a big-picture view of our pleasing achivements across the board. For anyone uncertain about the size of each project, doing a quick git log -u, and seeing how fun it is to get involved in LibreOffice is strongly recommended.
@Michael, I live in the woods, so storms inevitably bring down trees on wires. But fortunately there is a hotel not too far that has WiFi but no trees.
As for project-to-project comparisons, I did point out that using the same technique LibreOffice used — counting wiki accounts to estimate community size — Apache OpenOffice would be able to claim 77,000 contributors. How you like them apples-to-apples?
Of course, the point of this post is not to compare projects, but a much simpler task, to compare one project’s claims with reality. Until we have a proper set of claims, comparing projects would only compound errors. I hope you see that.
@Rob – Great post! It is good to know what is factual and what is inflated in any community. I would say that they were probably trying to highlight the growth within the community, and thus show it as a staunch competitor to Microsoft.
Separately, I am including a link to a blog post in 2011, where we compared LibreOffice to OpenOffice, through tree-mapping the two products. The blog title is: “LibreOffice vs OpenOffice: Anatomical study of a fork”, take a look and let me know your thoughts.
http://www.antelink.com/blog/libreoffice-vs-openoffice-anatomical-study-fork.html
@Lamar, Perhaps, but when competing against Microsoft, building your argument on false data is just asking for s world of pain. If you think I’m tough, imagine what Microsoft could do. In some countries, like Germany, where TDF is based, advertising claims are strictly regulated.
To your blog post: I like it. Counting the number of different files, or even the number of different lines is very coarse and magnifies the effect of trivial changes. Line-oriented file differencing is a performance compromise, based on hardware constraints circa 1985. But today, although computationally expensive, we should be able to look at “edit distance” between files, for example Levenshtein distance. I think that would be a a truer metric. But even then you need to preprocess to account for global bulk actions, like insertion of Apache License headers.