NetworkX vs. IGraph
Implementation
 NetworkX: implemented in Python
 IGraph: implemented in C
Performance
IGraph wins.
IGraph  NetworkX  

Singlesource shortest path  0.012 s  0.152 s 
PageRank  0.093 s  3.949 s 
Kcore  0.022 s  0.714 s 
Minimum spanning tree  0.044 s  2.045 s 
Betweenness  946.8 s (edge) + 353.9 s (vertex) (~ 21.6 mins)  32676.4 s (edge) 22650.4 s (vertex) (~15.4 hours) 
See ref
No. of Community Detection Algorithms
IGraph wins.
 NetworkX: only optimal modularity.
 IGraph: nine algorithms including optimal modularity; edge betweenness etc.
Ease of Programming
NetworkX wins.
Load a graph
 NetworkX can simply load a graph from a list of edge tuples.


 IGraph needs to first load vertices and then a list of edge tuples


Get information from existing graph
 NetworkX


 IGraph


Community Detection Algorithms
A list of algorithms available in IGraph include:
 Optimal Modularity
 Edge Betweenness (2001)
 Fast Greedy (2004)
 Walktrap (2005)
 Eigenvectors (2006)
 Spinglass (2006)
 Label Propagation (2007)
 Multilevel (2008)
 Info Map (2008)
Summary
For directed graph: go with Info Map. Else, pls continue to read.
If compuational resources is not a big problem, and the graph is < 700 vertices & 3500 edges, go with Edge Betweenness; it yields the best result.
If cares about modularity, any of the remaining algorithms will apply;
 If the graph is particularly small: < 100 vertices, then go with optimal modularity;
 If you want a first tryon algorithm, go with fast greedy or walktrap
 If the graph is bigger than 100 vertices and not a degenerated graph, and you want something more accurate than fast greedy or walktrap, go with leading eigenvectors
 If you are looking for a solution that is similar to Kmeans clustering, then go for Spinglass
Optimal Modularity
Definition of modularity:
Modularity compares the number of edges inside a cluster with the expected number of edges that one would find in the cluster if the network were a random network with the same number of nodes and where each node keeps its degree, but edges are otherwise randomly attached.
Modularity is a measure of the segmentation of a network into partitions. The higher the modularity, the denser ingroup connections are and the sparser the intergroup connections are.
Methodology:
GNU linear programming kit.
Evaluation:
 Better for smaller communities with less than 100 vertices for the reasons of implementation choice.
 Resolution limit: when the network is large enough, small communities tend to be combined even if they are wellshaped.
Fast Greedy
Methodology:
Bottom up hierarchical decomposition process. It will merge two current communities iteratively, with the goal to achieve the maximum modularity gain at local optimal.
Evaluation:
 Pretty fast, can merge a sparse graph at linear time.
 Resolution limit: when the network is large enough, small communities tend to be combined even if they are wellshaped.
Walktrap
Methodology:
Similar to fast greedy. It is believed that when we walk some random steps, it is large likely that we are still in the same community as where we were before. This method firstly performs a random walk 345, and merge using modularity with methods similar to fast greedy.
Evaluation:
 A bit slower than fast greedy;
 A bit more accurate than fast greedy.
Multilevel
Methodology:
Similar to fast greedy, just that nodes are not combined, they move around communities to make dicision if they will contribute to the modularity score if they stay.
Eigenvectors
Methodology:
A top down approach that seeks to maximize modularity. It concerns decomposing a modularity matrix.
Evaluation:
 More accurate than fast greedy
 Slower than fast greedy
 Limitation: not stable on degenerated graphs (might not work!)
Label Propogation
Methodology:
A bit like kclustering, with initialization k different points. It uses an iterative method (again just like kmeans): the target label will be assigned with the most “vote” of the lables from its neighbors; until the current label is the most frequent label.
 Evaluation:
 Very fast
 Like KMeans, random initialization yields different results. Therefore have to run multiple times (suggested 1000+) to achieve a consensus clustering.
Edge Betweenness
Definition of edge betweenness:
Number of shortest path that passes the edge.
It’s not difficult to imagin that, if there is an edge that connects two different groups, then that edge will has to be passed through multiple times when we count the shortest path. Therefore, by removing the edge that contains with the highest number of shortest path, we are disconnecting two groups.
Methodology:
Top down hierarchical decomposition process.
Evalution:
 Generally this approach gives the most satisfying results from my experience.
 Pretty slow method. The computation for edge betweenness is pretty complex, and it will have to be computed again after removing each edge. Suitable for graph with less than 700 vertices and 3500 edges.
 It produces a dendrogram with no reminder to choose the appropriate number of communities. (But for IGraph it does a function that output the optimal count for a dendrogram).
Spinglass
Methodology:
Complicated enough for me to ignore..
Evaluation:
 Not fast
 has a bunch of hyperparameters to tune from
Infomap
Methodology:
It is based on information theoretic principles; it tries to build a grouping which provides the shortest description length for a random walk on the graph, where the description length is measured by the expected number of bits per vertex required to encode the path of a random walk.
Evaluation:
 Used for directed graph analytics