Community Detection in Large Social Networks Using PCCA+


Communities in the American Football Club dataset

Community detection can reflect overall structure of a network and thus it can help for many real case scenarios like product marketing. To extract such communities from network one uses the objective function which captures the intuition of communities with high intra-community edges and fewer inter-community edges. Since optimizing such objective function is NP-hard, many researchers have tried many heuristics to find approximate communities. Most of the community detection algorithms based on greedy algorithms perform poorly on large complex networks. Moreover, many algorithms for community detection also require some prior knowledge of the community structure, e.g., the number of the communities, which is very difficult to be obtained in real-world networks. Algorithms are evaluated based on score called modularity. The algorithm which maximizes such modularity score is considered to be the best algorithm. In this project, we propose a new community detection algorithm which is based on Perron Cluster Analysis.

Approach :

  1. Given a graph, the adjacency matrix(A) and diagonal degree matrix (D) is computed.
  2. Using A and D the graph laplacian L is computed. L=inv(D)*A.
  3. Let K be the number of communities, K largest eigenvalues and corresponding eigenvectors of L is calculated.
  4. Each node(sample) is represented by the rows of K eigenvectors in a K-dimensional space. The sample that is at a maximum distance from the origin is found out and a simplex structure is built.
  5. From the simplex, a presence matrix is calculated which gives the probability of a node belonging to any community.
  6. A threshold is set and the nodes that have a higher presence in a community than the threshold are classified to be in that community.
  7. The optimal number of communities is that value of K for which there is a significant jump in eigenvalues.

We parallelized PCCA+ for community detection using Jacket by Accelereyes.

Illustration of Some Results :

Quick Observations :
  1. Modularity values given by PCCA+ is nearly equal to the best algorithm, Louvain’s algorithm.
  2. For larger data sets, even the serial implementation of PCCA+ is approximately two times faster than Louvain’s algorithm.
  3. For smaller data sets, PCCA+ and Louvain’s algorithm take approximately the same time to run. Whereas Girvan-Newman’s algorithm is 1 order of magnitude slower.

Leave a Reply