(This post represents my personal opinion only. The standard disclaimer applies.)
In previous posts I looked at claims made by LibreOffice, in project blog posts and press releases, related to the number of LibreOffice users and the number of active LibreOffice contributors. I showed that in both cases the claims from LibreOffice were greatly inflated due to various flaws. For example, they double counted users who upgraded from earlier release of LibreOffice, often several times over. And they counted as “active contributors” those who registered for a wiki account but never actually contributed anything. In this blog post we’ll look at the even more egregious ways which the LibreOffice project is overstating the number of developers that are active with the project.
A Quick Quiz
To prepare your frame of mind for what you are about to learn, I encourage you to first take the following quiz.
When asked to report on the population of your home town, what would you report?
A. The number of people with primary residences in the town.
B. The number of people who have ever lived in the town, even if they no longer live there.
C. The number of people who drive through the town on their way to somewhere else.
D. All of the above.
If you picked D, you would be an excellent candidate for the LibreOffice marketing department.
With that mental preparation out of the way, let’s continue.
The Claims
- From a recent LibreOffice announcement: “growing developer base, which has just reached the number of 550 since the launch of the project, making LibreOffice one of the fastest growing free software projects of the decade.”
- Or a couple of weeks ago: “LibreOffice is the result of the combined activity of 540 contributors”
- Quoted on Linux.com: “our large developer base — over 540 people at the end of September 2012 — is an incredibly efficient self-governing machine”
You can find many variations on this same claim.
All Your Developer Are Belong to Me
With a number this large, it should not be hard to find these 550 developers. So let’s see if we can track this down. One place to start is to look at the LibreOffice credits page. We see there a large table of “Developers committing code since 2010-09-28”. If we count the names in this table we get 469. Not quite 550, but pretty close, yes?
But if you look closer at the names in the list, you begin to scratch your head. There are names here of former Sun/Oracle developers who lost their jobs when Oracle stopped developing the project. Some commentators, like Mark Shuttleworth, put much of the blame for Oracle divesting from OpenOffice on the “radical faction” that forked to create LibreOffice. Now aside from costing them their jobs, LibreOffice now insults them by using their names for propaganda purposes to puff up LibreOffice’s developer claims?!
Looking further, I see the names of IBM colleagues who have never participated in the LibreOffice project. They are active developers on Apache OpenOffice, and former OpenOffice.org developers, but here they are listed in a table of “Developers committing code”. How curious the ways of LibreLand!
If you scroll down to the bottom of the table you get a clue in the fine print: “We can not distinguish between commits that were imported from the OOo/AOO code base and those who went directly into the LibreOffice code base.”
Hmmm… so let me get this right. If you take my code, you say that I committed it to the LibreOffice project. And if I contributed to the code to OpenOffice.org or Apache OpenOffice, and you take it, you’ll list me in your LibreOffice developers table for a contribution I never made to LibreOffice and put a “joined” date next to my name for an organization I never joined. Really?
This is an odd way of accounting for developers. I’m pretty sure that 100% of readers of LibreOffice press releases and 100% of journalists who write articles based on LibreOffice claims would feel somewhat abused by such idiosyncratic definitions. It is certainly not the most honest and forthright way of stating how many developers LibreOffice has. One does not expect that OpenOffice.org developers, who were never involved with the LibreOffice project, and may not even know that their code is being used, will be included in the count.
Monotonically Increasing
Aside from counting people who are not actually involved in LibreOffice and never were, the LibreOffice claims are peculiar because of the low threshold for inclusion and the perpetuity of inclusion once added. When you hear claims of a “developer base” you are lead to think of a body of actual developers actually working at present on the code. That would be the normal usage. But in LibreLand it is not done that way. If you made a single contribution ever (or as we know now from the above, even never) then you are in the “developer base” and will be listed as a LibreOffice developer for all time.
From the perspective of gratitude and acknowledgement, giving credit is fair and generous. Apache OpenOffice also has a long list of names on its Credits page. But we don’t tally this retrospective list of past contributors and claim that number as an active community size. From the perceptive of claiming a community size, this would be deceptive. That is like calculating the population of a town by listing everyone who ever lived there.
Because of this odd practice, the LibreOffice developer count will never decrease. It can only go up. Even — worst case — if an asteroid hits their next hackfest — the numbers would merely be flat. (So would the developers present) In any case if you’ve designed a metric that can never decrease, then it should not be newsworthy for you to report that it is increasing. This is not an accomplishment. That is just mathematics.
How to Juice the Developer Count
An easy way to increase, for reporting purposes, the number of “developers” a project can claim is to encourage trivial churning of the code base. For example, translating comments from German to English, removing dead code and other similar tasks can be done without even really knowing C++, or at least not knowing it well. But it can prompt the temporary or even one-time participation of many “developers”, and in the process increase your developer count. LibreOffice made a tremendous effort to enable a low threshold for contributions and this effort paid off, at least in developer counts.
As an example of the impact such practices can have, I took a look at the “core” git repository for LibreOffice, and all of the commits since 2010-09-28. After identifying and collapsing multiple email addresses used by some persons, I ended up with 518 names. Of those names, 166 , or 1/3 of them, have made only a single commit, and then were never heard of again. So it is curious to count them as part of LibreOffice’s vaunted “developer base”. A community is not made up of those who contribute once and then leave.
In fact, once you take out those who never participated in LibreOffice but had their code taken from OpenOffice, you find that almost no one in this “developer base” actually does anything. For example 261 of the “developers” combined (over 1/2 of all of the claimed developers) together did only 1% of total commits. So there is a long tail of inactive “developers” who are puffing up the LibreOffice claims.
This is a little easier to see with reference to the following chart, which shows the cumulative number of code commits (y-axis) against the cumulative number of developers (x-axis). It shows, for example, that 10% of the developers, mainly Novell/SUSE and RedHat employees, were responsible for nearly 90% of all of the work. It also backs up my observation that the vast majority of the claimed “large developer base — over 540 people” and the “incredibly efficient self-governing machine” makes an overall miniscule contribution. There is nothing wrong with this graph per se,. Many projects will show some form of this. But if you make a primary claim on your project’s success as having an independent developer community of 550 people, it is a bit embarrassing that most of them are not actually active, and that many of them never were.
Toward a Better Metric
Part of the confusion here seems to stem from the desire to illustrate two things with one metric: project capabilities and project diversity. That is asking too much for one metric. If you want to look at the capabilities of the project, and do it from the input side, then you need to deal with normalizing differences in skills, experience, time on task, motivation, etc. This is very difficult in a project where you have a mix of full-time Novell/SUSE employees mixed in with part time and occasional developers. But of all available options, a raw count of developers is the worst possible metric to pick. It is meaningless. Better would be to look at commits, or better line counts, or even better function points, or hours on task, or features, or some measure of output. No one cares what your input is. A feature developed by 3000 is not necessarily better than a feature developed by 3. Results are what counts.
From the diversity standpoint, adding hundreds of names who do nothing is not a way to increase diversity. There are standard metrics for measuring diversity, inspired by Shannon’s definition of information entropy and commonly used in ecological species surveys. The Shannon Equitability Index is a scaled value, 0-1, that measures diversity. A value of 0 would indicate no diversity, that one person did all the work and the other names had zero contribution.. A value of 1 would indicate that the work was evenly done. In the chart above a value of 1 would have a line at 45-degrees up from lower left to upper right. If you calculate the Shannon Equitability Index for LibreOffice for all commits since 2010-09-28 you get a value of 0.6413. It would be interesting to see how this value evolves over time. Oh, and if you calculate this index for Apache OpenOffice, the value is 0.7268, which is even better, more diverse.
(This post represents my personal opinion only. The standard disclaimer applies.)
Interesting. Same blog from which it was stated that OpenOffice was not going to bring a zero sum game to the question seems to be insistent in reducing LibreOffice’s statistics.
@Bob, IMHO, LibreOffice should get full credit for everything they’ve accomplished, but no more than that. Do you disagree?
You might want to link to the official TDF statistics on LibreOffice: http://documentfoundation.files.wordpress.com/2012/10/tdf-infographics2years.pdf Which lists a lot of statistics that you also mention here and splits out contributors in different groups of active/less active hackers and shows how much code is coming from which individuals and companies. It even splits out those contributions flowing into libreoffice from ooo.
Hi :)
What does OpenSource mean?
Why shouldn’t the original authors of the code be given some sort of acknowledgement of the work they did that is being used in all different projects where their work is being used? People seldom arrive in the world with their programming skills fully formed. Some people need to learn and practice their skills on simple things.
Streamlining and cleaning code and making it more accessible to more people may not seem like much but the code is around 30% lighter even with increased functionality. LibreOffice seems to starts-up and run faster. Sometimes it is the little things that make a big difference imo.
Regards from
Tom :)
@Mark, I am aware of that “infographic” and its earlier incarnations. It focuses primarily on counting and categorizing developers. I explain in detail why developer counts are not particularly in this post. Short version: if over half of the developers account for only 1% of the commits, then you can double or half the number of developers while having a negligible effect on the product. If you can do that with a metric, then the metric does not have much of a bearing on reality, does it?
@Tom, credit where credit is due, certainly. We have a long list of volunteers who we credit for Apache OpenOffice: http://www.openoffice.org/welcome/credits.html But we don’t promote the length of this list as our current developer community. That would not be honest.
I think it is nice of the document foundation to split out the various developer/contributions in all these different categories. Precisely because just mentioning one number in some marketing material is not going to satisfy everybody. It is not hard to get at the underlying numbers for anyone interested so they can make up their own mind whether the numbers are significant or not. I think you underestimate the importance of the long tail. Even having 500 hackers fixi only one little bug does have an impact on the overall project, even if each individual fix is just a tiny drop. Yes, not all hackers will will do lots of fixes, but they all count IMHO. They provide a more capable project and should be credited for it. Thanks hackers!
@Mark, the fact is they do mention one number in most of the press releases, and the number they mention — 550 — is the most deceptive of all the numbers they could come up with.
As for the long tail, look at the data and the chart. The least active 50% of the developers combined account for only 1% of the code commits. That is not a significant impact. There is not a lot going on in this long tail. The tail, as far as I can tell, consists primary of those who poked around and left the project. The long tail represents the lost opportunity, those who did not stick around.
Is the impact zero? No, it might actually be negative. Studies have shown that programmer productivity, even with the same number of years of experience, varies by an order of magnitude, and that in any programming team you probably have developers who are a net producers of bugs, e.g., their net contribution towards product quality is negative. So if you do open up the codebase to marginal contributions to “550 developers” most of whom lack experience with the code, and you do not have a corresponding plan to deal with the QA implications of this, then you are asking for trouble. The bugs, performance issues and instability in LibreOffice came from somewhere, yes? This was entirely foreseeable.
It seems to me it is just not the number you would use, but it is a nice number, it shows how much activity is going on. They aren’t hiding any of the other numbers, you could easily find them. You are very dismissive about all the people contributing. Poking around and fixing some small issue is still a net positive that should be credited and celebrated IMHO. And even if only 10% is very active and does 90% of the work, that is still 50+ hackers who supervise all the work done by the “little hackers”. They do seem to have a good QA plan to deal with increased activity: https://www.libreoffice.org/get-involved/qa-testers/ and lots of people doing QA https://skyfromme.wordpress.com/2012/11/01/libreoffice-quality-heros/ so why not invite more hackers to contribute? One of these 90% might rise to the 10% one day.
@Mark, Do you think that LibreOffice is honoring their contributors? Look again at the announcements. They aren’t linking to the page with the volunteer names, are they? They are not giving credit. They are focused on pumping up their fictitious numbers, mixing in legacy OOo contributors and Apache OpenOffice developers with former and current LibreOffice developers.
Look again at the claims of 3000 volunteers based on claims that the wiki had 3000 registrations. But most of the wiki “contributors” made *zero contributions*. I am not being dismissive of contributors. I am being dismissive of the deceptive LibreOffice claims.
Yes, let’s give credit where credit is due. LibreOffice should get credit for 100% of its accomplishments, but not more.
Rob, great post on this topic. I’ve been wondering what’s going on with Libre/Open Office. I think you make some great points (maybe LibreOffice makes less) but I think its worthwhile for the hate on both sides to stop. We really need one strong open-source competitor to MS Office; this infighting between the two camps doesn’t benefit anyone except MS.
Mate, Just leave the project alone, you obviously dont like it so why bash it and cause trouble for people who genuinely like libreoffice and use it daily. i dont care how many developers there are, i dont care if you dont like it, i only care that it works super well and it does what i need. So stop bashing and go somewhere else. i think you really need to look in the mirror.
Also now i see you moderate comment which means you dont post genuine negative comments about your baseless assertions, and now i see your rob weir the most stupid dumbest (so called it person) that ever lived. i think the whole of the it sector actually hates you and wants you to go away. as for journalismn, well you dont do journalism you do stupid lies. noone believes anything you say cause your stupid. GET IT, STUPID. so now go fuck off.
@Maisley, you appear to have some anger-management issues. But aside from that, I don’t think any of us do LibreOffice favors by ignoring their false marketing claims. Even if you support LibreOffice, which apparently you do, I hope you agree that it is better for them to clean up their act now than deal with a much messier set of circumstances later. If I can poke holes in their claims which one could drive a tank through, imagine the trouble a more determined opponent, say Microsoft, could do to discredit (or worse) LibreOffice? For example, Germany (where LibreOffice is based) strictly regulates claims in advertising. Untruthful statements can get one in trouble.
Rob, This is perhaps one of the most cynical posts I have seen from you. I think you may be onto something with the train of thought that says ‘LibreOffice has a counting methodology that is drastically different from the rest of the open source community’. But to say that translations are not enough to consider them a contribution is going too far. If you’re going to start a competition (aka flame war) with The Document Foundation, let it be one like Chrome vs. Firefox, where you compete on features and users benefit. Right now it seems like you want to detract from one ODF project to increase interest in your own.
Also, regarding your previous post about Download number…
Can you explain what makes it so difficult to implement an incremental upgrade process in ODF applications? Cause that would be awesome.
@Stevo, Don’t put false words into my mouth. I *never* said that translations should not be counted as contributions. But translators are typically not counted as developers. They are counted as, well, translators.
As for incremental updates, it can be done. We’re working on it for possible inclusion in AOO 4.0. There is the technical side of it — how to create and apply an incremental update. And then there is the question of what upgrades would have the incremental option. For example, we might allow incremental update from 4.0.0 to 4.0.1 and 4.0.2, but not to 4.1.0. We need to balance convenience for the user with preserving a reasonable set of paths we can test.
“Some commentators, like Mark Shuttleworth, put much of the blame for Oracle divesting from OpenOffice on the “radical faction” that forked to create LibreOffice. Now aside from costing them their jobs, LibreOffice now insults them by using their names for propaganda purposes to puff up LibreOffice’s developer claims?!”
Wow! Please more alternate history, that’s fun.
I think you should reconsider claims that you are trying to disprove.
If someone says “X is the result of the combined activity of Y contributors” (second claim), then he can count all people that has ever did anything for X in. These can be core developers, occasional hackers and even people who did one small commit and never return. These are translators, designers, infrastructure administrators. These probably are bug reporters, most likely donators and perhaps even user-support members . And, in fact, if you count them all in (although counting all active user-supporters around the world will be rather hard), then you will see that “550” is well decreased number.
I don’t really see problem in counting IBM and Sun OOo developers in LO contributions. LO started as fork of OOo, so all contributions made to OOo were – ex definition – also made, although unconsciously, to LO. This is how Open Source works. You release your code on liberal license and everyone can do anything he likes with it. By releasing your code, you contribute to applications you have never heard of.
If you release your code on liberal license, then you are aware that this may happen. If you did not want your code to be used by others, then why did you choose such license in first place? You were free to create closed-source application.
But the real question is: how many of IBM and Sun/Oracle employees listed on LO contributors page contacted TDF and did anything to remove their names from list? None? Then perhaps most of them don’t feel insulted, after all.
Anyway, if anyone did, I would be interested in knowing outcome of this action.
I think this is pretty hard to build walls in open communities, but you are trying your best.
@Miroslaw, I think you are confusing “code” and “community”. Go back to the claims that TDF actually made. They were not making a claim about the code. They did not say (as you did) “X is the result of the combined activity of Y contributors”. They made a claim about the community. Their claims, as I have shown and no one has refuted, were exaggerated beyond any reasonable relationship to reality. They do not “have” 540 developers working on the project. This is as deceptive as counting a town’s population by stating how many people ever lived there.