(This post represents my personal opinion only. The standard disclaimer applies.)
In previous posts I looked at claims made by LibreOffice, in project blog posts and press releases, related to the number of LibreOffice users and the number of active LibreOffice contributors. I showed that in both cases the claims from LibreOffice were greatly inflated due to various flaws. For example, they double counted users who upgraded from earlier release of LibreOffice, often several times over. And they counted as “active contributors” those who registered for a wiki account but never actually contributed anything. In this blog post we’ll look at the even more egregious ways which the LibreOffice project is overstating the number of developers that are active with the project.
A Quick Quiz
To prepare your frame of mind for what you are about to learn, I encourage you to first take the following quiz.
When asked to report on the population of your home town, what would you report?
A. The number of people with primary residences in the town.
B. The number of people who have ever lived in the town, even if they no longer live there.
C. The number of people who drive through the town on their way to somewhere else.
D. All of the above.
If you picked D, you would be an excellent candidate for the LibreOffice marketing department.
With that mental preparation out of the way, let’s continue.
The Claims
- From a recent LibreOffice announcement: “growing developer base, which has just reached the number of 550 since the launch of the project, making LibreOffice one of the fastest growing free software projects of the decade.”
- Or a couple of weeks ago: “LibreOffice is the result of the combined activity of 540 contributors”
- Quoted on Linux.com: “our large developer base — over 540 people at the end of September 2012 — is an incredibly efficient self-governing machine”
You can find many variations on this same claim.
All Your Developer Are Belong to Me
With a number this large, it should not be hard to find these 550 developers. So let’s see if we can track this down. One place to start is to look at the LibreOffice credits page. We see there a large table of “Developers committing code since 2010-09-28”. If we count the names in this table we get 469. Not quite 550, but pretty close, yes?
But if you look closer at the names in the list, you begin to scratch your head. There are names here of former Sun/Oracle developers who lost their jobs when Oracle stopped developing the project. Some commentators, like Mark Shuttleworth, put much of the blame for Oracle divesting from OpenOffice on the “radical faction” that forked to create LibreOffice. Now aside from costing them their jobs, LibreOffice now insults them by using their names for propaganda purposes to puff up LibreOffice’s developer claims?!
Looking further, I see the names of IBM colleagues who have never participated in the LibreOffice project. They are active developers on Apache OpenOffice, and former OpenOffice.org developers, but here they are listed in a table of “Developers committing code”. How curious the ways of LibreLand!
If you scroll down to the bottom of the table you get a clue in the fine print: “We can not distinguish between commits that were imported from the OOo/AOO code base and those who went directly into the LibreOffice code base.”
Hmmm… so let me get this right. If you take my code, you say that I committed it to the LibreOffice project. And if I contributed to the code to OpenOffice.org or Apache OpenOffice, and you take it, you’ll list me in your LibreOffice developers table for a contribution I never made to LibreOffice and put a “joined” date next to my name for an organization I never joined. Really?
This is an odd way of accounting for developers. I’m pretty sure that 100% of readers of LibreOffice press releases and 100% of journalists who write articles based on LibreOffice claims would feel somewhat abused by such idiosyncratic definitions. It is certainly not the most honest and forthright way of stating how many developers LibreOffice has. One does not expect that OpenOffice.org developers, who were never involved with the LibreOffice project, and may not even know that their code is being used, will be included in the count.
Monotonically Increasing
Aside from counting people who are not actually involved in LibreOffice and never were, the LibreOffice claims are peculiar because of the low threshold for inclusion and the perpetuity of inclusion once added. When you hear claims of a “developer base” you are lead to think of a body of actual developers actually working at present on the code. That would be the normal usage. But in LibreLand it is not done that way. If you made a single contribution ever (or as we know now from the above, even never) then you are in the “developer base” and will be listed as a LibreOffice developer for all time.
From the perspective of gratitude and acknowledgement, giving credit is fair and generous. Apache OpenOffice also has a long list of names on its Credits page. But we don’t tally this retrospective list of past contributors and claim that number as an active community size. From the perceptive of claiming a community size, this would be deceptive. That is like calculating the population of a town by listing everyone who ever lived there.
Because of this odd practice, the LibreOffice developer count will never decrease. It can only go up. Even — worst case — if an asteroid hits their next hackfest — the numbers would merely be flat. (So would the developers present) In any case if you’ve designed a metric that can never decrease, then it should not be newsworthy for you to report that it is increasing. This is not an accomplishment. That is just mathematics.
How to Juice the Developer Count
An easy way to increase, for reporting purposes, the number of “developers” a project can claim is to encourage trivial churning of the code base. For example, translating comments from German to English, removing dead code and other similar tasks can be done without even really knowing C++, or at least not knowing it well. But it can prompt the temporary or even one-time participation of many “developers”, and in the process increase your developer count. LibreOffice made a tremendous effort to enable a low threshold for contributions and this effort paid off, at least in developer counts.
As an example of the impact such practices can have, I took a look at the “core” git repository for LibreOffice, and all of the commits since 2010-09-28. After identifying and collapsing multiple email addresses used by some persons, I ended up with 518 names. Of those names, 166 , or 1/3 of them, have made only a single commit, and then were never heard of again. So it is curious to count them as part of LibreOffice’s vaunted “developer base”. A community is not made up of those who contribute once and then leave.
In fact, once you take out those who never participated in LibreOffice but had their code taken from OpenOffice, you find that almost no one in this “developer base” actually does anything. For example 261 of the “developers” combined (over 1/2 of all of the claimed developers) together did only 1% of total commits. So there is a long tail of inactive “developers” who are puffing up the LibreOffice claims.
This is a little easier to see with reference to the following chart, which shows the cumulative number of code commits (y-axis) against the cumulative number of developers (x-axis). It shows, for example, that 10% of the developers, mainly Novell/SUSE and RedHat employees, were responsible for nearly 90% of all of the work. It also backs up my observation that the vast majority of the claimed “large developer base — over 540 people” and the “incredibly efficient self-governing machine” makes an overall miniscule contribution. There is nothing wrong with this graph per se,. Many projects will show some form of this. But if you make a primary claim on your project’s success as having an independent developer community of 550 people, it is a bit embarrassing that most of them are not actually active, and that many of them never were.
Toward a Better Metric
Part of the confusion here seems to stem from the desire to illustrate two things with one metric: project capabilities and project diversity. That is asking too much for one metric. If you want to look at the capabilities of the project, and do it from the input side, then you need to deal with normalizing differences in skills, experience, time on task, motivation, etc. This is very difficult in a project where you have a mix of full-time Novell/SUSE employees mixed in with part time and occasional developers. But of all available options, a raw count of developers is the worst possible metric to pick. It is meaningless. Better would be to look at commits, or better line counts, or even better function points, or hours on task, or features, or some measure of output. No one cares what your input is. A feature developed by 3000 is not necessarily better than a feature developed by 3. Results are what counts.
From the diversity standpoint, adding hundreds of names who do nothing is not a way to increase diversity. There are standard metrics for measuring diversity, inspired by Shannon’s definition of information entropy and commonly used in ecological species surveys. The Shannon Equitability Index is a scaled value, 0-1, that measures diversity. A value of 0 would indicate no diversity, that one person did all the work and the other names had zero contribution.. A value of 1 would indicate that the work was evenly done. In the chart above a value of 1 would have a line at 45-degrees up from lower left to upper right. If you calculate the Shannon Equitability Index for LibreOffice for all commits since 2010-09-28 you get a value of 0.6413. It would be interesting to see how this value evolves over time. Oh, and if you calculate this index for Apache OpenOffice, the value is 0.7268, which is even better, more diverse.
(This post represents my personal opinion only. The standard disclaimer applies.)