So what do we have here? This is a simple social network visualization, of OASIS Technical Committees. Each circle in this graph represents a single Technical Committee (TC). The size of the circle is proportionate to how many members are on the committee. The lines between the committees have a weight that is proportionate to the overlap in membership between the TCs. In this case I used Dice’s coefficient as a metric, although any of the several set similarity metrics (Jaccard, etc.) would work here. The color of each node represents the modularity class, a measure of communities or sub-networks within the graph. The resulting graph was then run through Gephi and its Force Atlas layout algorithm , which brings together the TCs that are more closely related by overlapping membership. Click the image for a larger version.
(For those who are interested, the raw data for this is all publicly available, on the OASIS website. Scraping the webpages for the data, calculating the graph and outputting a GEXF format file for Gephi was accomplished in 133 lines of Python.)
Note one important fact: the graph is formed entirely on abstract concepts, the size of each committee and the overlaps in membership. It has no knowledge of what the underlying technologies are, the companies and individuals involved, or of other items of semantic value that could describe the work of the committee. The structure is essentially based on the interests and affiliations of individual committee members. Where there is common interest it is assumed that there is commonality in the work of the TCs.
So how well does this match reality? The image that follows (click for an enlarged version) is the same chart, but with each node labeled by the short name of the TC. As you can see, the above approach does a fine job bringing together related TCs. This occurs both at the fine-grained level, where the DITA TC and the DITA Adoption TC, or the SCA and SCA Assembly TCs are adjacent, and it also applies at the broader level, where we see communities for content-related standards, for privacy/identity standards, legal/emergency, etc.