GLay is a cytoscape plugin that offers an assorted collection of community analysis algorithms and layout functions, dynamically linked from igraph.
To install, simply extract, and put glay.dll and glay.jar into cytoscape/plugins/ folder. Currently we only provide C++ binaries for windows vista/7 64bit platform. Binaries for other platforms will be released in the future.
The latest build of GLay can be downloaded here:
Here’s a rush release of the JAVA and C++ source files for GLay. The cleaned versions will be released in the near future.
Instructions for compiling on Linux
Samad Lotia from Agilent Technologies offered some help on compiling the source code on Linux OS. In order to do so, you will need the following jar libs, in addition to Cytoscape source jars:
And he offered a build file to compile the java code:
Here’s his procedure to compile the C++ code of GLay on Linux. You will need igraph library:
Finally, you will need this java source to compile GLay on Linux (The Netbeans workspace file detects whether platform is windows. If not, the plugin will set all GLay C++ functions not accessible)
- To load community analysis, go to Cytoscape->Plugins->GLay
- To use igraph layout, go to Cytoscape->Layout->GLay-igraph
- FastGreedy(G): A fast algorithm with greedy optimization of the modularity score. The algorithm is described in detail by Clauset et al. in the paper here. The algorithm begins by setting each node into a separate community and progressively merge those with the maximum increase to the modularity score. The hierarchical merging tree is cut at the point where maximum modularity is achieved. This (G) version is a Java implementation and can run on all platforms. It works on both connected and disconnected networks.
- FastGreedy(HE,HN,HEN): It has been proposed by Wakita et. al in this paper that sometimes FastGreedy algorithm tend to produce unbalanced communities, with one or a few very big community and many other very small communities. They add a cluster size or edge density penalty at each merging step to penalize those merges which result in giant communities. The result is a slightly compromised modularity score but with much balanced community size distrition. Please refer to the paper for further details. These are also Java implementation and are therefore platform independent.
- Cluster Connected Components(I): This is an igraph function calculates connected components of a network using Breadth-first search. The resultant clusters are connected subnetworks from the original network. The user may select a component, the choose File->New->Network->Selected Nodes, all Edges(Ctrl+N) to create a new connected subnetwork. This function can be used to isolate the large connected components in a network from the small connected components and singletons.
- Fast Greedy(I): This is the igraph version of the fast greedy algorithm, more efficient than our Java implementation.
- Walk Trap(I): This is an igraph function using random walks to detect community structure. The basic idea is that short random walks tend to stay in the same community (from igraph R function description). The original paper by Pons et. al is here. Note that it only works on connected networks, which can be obtained using Cluster Connected Components.
- Label Propagation(I): This is a very efficient igraph function using label propagation algorithm. Each node is in a unique community to begin wih, then the communities are assigned progressively via majority vote in the neighborhood. This algorithm works on both connected and disconnected networks. For details about this algorithm, please refer to the paper by Raghavan et. al here.
- Edge Betweenness(I): This is one of the earliest community detection algorithms proposed my Newman et. al. Edge betweenness is a score which measures the number of shortest path flows through a certain edge. Edges with a high betweenness is thus considered as a bridge connecting communities with densely connected nodes. By gradually remove the edges with the highest betweenness score we will get a dendrogram, in which the root is the entire network and the leafs are individual vertices. The original paper by Newman et. al can be found here. This algorithm works only on connected networks.
- Spin Glass(I): This is an igraph function tries to find communities via spin-glass model and simulated annealing. The original paper by Reichardt can be found here. This algorithm also works only on connected networks.
- Spin Glass Single(I): Similar algorithm as the previous one, this method allows identification of community surrounding one or a few nodes, without computing the global community structure. To use this algorithm the user needs to select one or a few nodes as seeds. This algorithm also works only on connected networks.
- Leading Eigenvector(I): This is an igraph function finding communities by calculating the leading non-negative eigenvector of the modularity matrix. From igraph R package reference manual, “The heart of the method is the definition of the modularity matrix, B, which is B=A-P, A being the adjacency matrix of the (undirected) network, and P contains the probability that certain edges are present according to the ‘configuration model’. In other words, a P[i,j] element of P is the probability that there is an edge between vertices i and j in a random network in which the degrees of all vertices are the same as in the input graph. The leading eigenvector method works by calculating the eigenvector of the modularity matrix for the largest positive eigenvalue and then separating vertices into two community based on the sign of the corresponding element in the eigenvector. If all elements in the eigenvector are of the same sign that means that the network has no underlying comuunity structure.” For more details, please refer to the original paper by Newman et. al here.
Legends in community algorithms:
- This algorithm works only on connected network
- This algorithm works both on connected and disconnected network
- This algorithm works on connected network, and requires one of more selected nodes as seed
- GraphOPT: a port of graphopt layout by Michael Schmuhl. The original library can be found here. It optimizes network layouts.
- DRL: Force directed layout algorithm tailored for very large networks. (Currently got some problems.)
- Fruchterman Reingold: Force based layout algorithm proposed by Fruchterman and Reingold.
- Kamada Kawai: Force based layout algorithm proposed by Kamada Kawai. It works better on a connected network.
- Reingold Tilford Tree/Circular: Generates hierarchical layout, either circular or tree-like. Need to define a node as root.
- Fruchterman Reingold Grid: similar to Fruchterman algorithm with repelling force calculated only between vertices that are closer within a limit. It’s very efficient on large networks.
- LGL: For large connected graphs, similar to Large Graph Layout software at here. The algorithm needs to define a node as root. If not specified, the algorithm picks one at random.
- MDS: 2D multi-dimensional scaling based on the distance matrix computed from shortest paths. Not suitable for large networks.
Legends in layout algorithms:
- This algorithm requires a seed
- This algorithm works only or better on connected network
- This algorithm works on connected network, and requires one node as seed
The above figure explains the functions can be executed from the user interface. Parameters for layout algorithms can be tuned in the layout config dialogue. Note that the default settings are adequate for majority of operations.
A Brief comparison with other graph clustering methods
This plugin is designed for detecting clusters with strongly connected nodes from a network. Compared with commmunity algorithms, MCODE tend to produce smaller clusters; also it doesn’t necessarily assign membership to all the nodes. In our test on a network of 5781 nodes and 83316 edges, the computation time for MCODE is 148s and 15s for igraph fast-greedy community algorithm.
This plugin is similar to MCODE. It scores a pair of nodes based on the likelihood of them sharing the same neighbour. NEMO halted after ~ 5 minutes of execution on the same network mentioned before.
This package provides several nice methods for network clustering, including Hierarchical Clustering, K-means and MCL. The major difference between these methods and community algorithms is that they require at least one numerical attribute, such as microarray gene expression, for computing distance matrices.
Screenshots of GLay, MCODE and NEMO are given below. Note that MCODE and NEMO tend to produce small clusters.
Some Screenshots from GLay examples
Bind-human dataset, largest connnected component, GLay dark style w/ community algorithm, Fruchterman-Reingold Grid layout
GalFiltered, Reingold-tilford circular, GLay light style
List of external links and references