Communities of Links

Ahn, Bagrow, and Lehmann (Nature 466, 761-764 (2010), and arXiv, hereafter ABL) very recently introduced a new concept in community identification: instead of clustering the nodes, cluster the links — or in familiar terms, don't classify people, classify their conversations. The concept is intrinsically attractive because it matches common observation: your conversations with your family, with your college buddies, and with your co-workers represent different communities that do not overlap. The method also has no resolution restriction and works naturally with bipartite data, which is not true of most alternatives.

I refactored the ABL method to separate completely the clustering of edges from the decision about what is the "right" number of communities. This has little time penalty; in fact having a "pure" clustering routine makes it easier to accelerate (and later swap in a totally different clustering method, like k-medoids). This also makes it straightforward and clean to calculate partition density (the criterion by which ABL count communities) de novo for each community, and has the added benefit that it's now easy to include an interactive option for the user to dial-in their own preference for how finely to slice the data (each dataset defaults to the number of communities that maximizes overall partition density).

The simplest dataset is from ABL Figure 1d. Many thanks to Sune Lehmann and Jim Bagrow for their feedback and bug-finding advice. This is an active experiment in complex data visualization; all comments welcome.

The good news is that the ABL method is powerful and flexible. The challenge is that the communities it reveals are of links, not nodes, and therefore not as obvious to portray and interpret. So far the literature method is to use a traditional force-based network diagram and color the lines between the dots, rather than color the dots. Not bad, but this has the limitations of force-directed network diagrams have always had: a big "wow factor" but of limited practical interpretive use because of the spaghetti of crossing lines. So here you'll find outright experiments, and that means that some will be different!

In the upper circular graph the dots are the nodes and the polygons show community membership of those nodes (the colors match the table and dendrogram); line crossing is minimized by working around in cluster-joining order (same as the ROYGBIV color order). Communities are equally distributed around the circle with anchor points shown as black-centered dots; each node is placed as the weighted sum of its coordinates of each anchor to which it belongs, plus some random jitter to separate nodes with single community membership. The community ordering and coloring has an interesting result: the diagram gets simpler to see as the number of communities is increased, even far above the partition density "optimum".

The method is fast because it's completely deterministic and drawn in one pass, i.e. it's not an iterative force-relaxation method. This version also includes a very nice convex hull routine, with attribution in the source code.

The small graph shows partition density (y) vs. cluster joining distance (x), and the red line shows the community threshold distance (which you can change with the community-adjust buttons). Choosing fewer communities moves the theshold distance toward the root (you're choosing fewer, bigger branches in the dendrogram). As long as the partition density is reasonably high you'll get useful communities. In fact I've tweaked the total partition density: multiplying it by distance-squared biases toward the "last big peak" of density. It's a tradeoff between true density maximization and parsimony of output.

The table to the right shows node membership: each node's presence in each color-coded community is shown by a colored tile. It's sorted with the most-connected nodes at the top (e.g. Valjean, Dorothy), and color intensity shows "dilution" across multiple memberships. The columns are shown and color-coded in their clustering order, so you'll find similar communities in adjacent columns and colors.

Finally at the bottom is a circular dendrogram. The edge labels make it clear that we've clustered links, because every label shows the pair of nodes that form that link. Colors of leaves (= edges) and branches match the community colors. Branches above the community density cutoff are in light gray. As you change the number of communities you're simply changing the radius at which the black branches change to gray. Orphan leaves (those which join only above the cutoff) are shown in light gray.

Rob Spencer, August 2010

Finding Communities of Links

About the Algorithm

About the Graphics

Demo

Finding Communities of Links

About the Algorithmshow / hide

About the Graphicsshow / hide

Demo

About the Algorithm

About the Graphics