I did a quick study of the 2013 mailing list traffic for the Apache OpenOffice project. I looked at all project mailing lists, including native language lists. I omitted the purely transactional mailing lists, the ones that merely echo code check-ins and bug reports. Altogether 14 mailing lists were included in this study.
In 2013 the OpenOffice community mailing lists saw 24,423 posts from 2,211 unique posters, in 4,819 threads.
A word cloud of the most frequent words in post titles (thanks to Jonathan Feinberg’s Wordle app) follows. As you can see, the terms used in the Propose/Approve/Code/Test/Release workflow rise to the top. That shows the project’s focus.
I thought it would also be interesting to look at this from a social network perspective, looking at the atomic units of collaboration on a mailing list: responding to a post. Of course, not all posts involve a response. It is common for someone to post information, not requiring or expecting a response. But there are many responses. As mentioned above, there were 24,423 posts in 4,819 threads, so an average of 4 responses per post. We can represent this as a directed graph, with each poster treated as a node, and a directed arc to each responder node from the node of the original post author. (This might seem backwards, and you could argue for reversing the arcs, but in general in mailing lists the responder is providing value to the original poster, so the centrality of the responder will be more relevant. Consider, for example, the questions coming from random users, and the experienced project members who answer them.)
Forming a graph in this way gives us a giant component (representing 98.84% of the whole graph) with 1,955 nodes and 7,069 arcs. Average degree (number of collaboration partners for each person) is 3.6. 46 people responded to more than 50 other people. Maximum degree is 714 (Apache OpenOffice V.P. Andrea Pescetti). A visualization of this graph, using the open source Gephi) follows. You can click on the image for a larger version. Nodes have been scaled to reflect betweenness centrality (a measure the degree to which a node helps connect others into the graph) and colored via a modularity algorithm which finds sets of nodes that have a high degree of interconnection.
You should click on the graph to see the full-size version.
What a marvelous, large and complex project we have in Apache OpenOffice!