Generalized Topological Overlap Matrix and its Applications

in Gene Co-expression Networks

Supporting On-line Materials

Andy M. Yip and Steve Horvath


Content


Source Codes for an R Implementation of GTOM

Download: gtom.R

Usage: gtom = GTOMmdist1(adjmat1, m)

<adjmat1> is a symmetric binary adjacency matrix encoding an undirected unweighted network

<m> is a positive integer indicating the order of the GTOM

¡@

To obtain R (a free statistical computing software), click here.


Data Sets

We use a yeast cell cycle data set published in:

Spellman et al., (1998).  Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization.  Molecular Biology of the Cell 9, 3273-3297.

The original data set by Spellman et al. (1998) : combine.txt (tab-delimited)

The data set we preprocessed1 (4000 genes, 44 conditions): YEASTCellCycle4000.csv (comma-separated)

The corresponding gene information file: YEASTCellCycle4000_geneinfo.csv (comma-separated)

¡@

1 First, arrays with more than 5% missing values are removed. Then the 4000 most varying genes (measured by standard deviation) are selected.


Example (in R)

# Read in the expression levels
datExpr = read.csv("YEASTCellCycle4000.csv", header=T, row.names=1)
datExpr = t(datExpr)

# Read in the gene information
datInfo = read.csv("YEASTCellCycle4000_geneinfo.csv", header=T, row.names=1)

# Correlation matrix
cormat = cor(datExpr,use="p")

# Adjacency matrix
tau = 0.7
AdjMat1 <- as.matrix(abs(cormat)>=tau)
diag(AdjMat1)=0

# Restrict our attention to 1000 highly connected genes
Degree = apply(AdjMat1,1,sum)
DegreeRank = rank(-Degree)
keep.node = DegreeRank <= 1000
AdjMat1.1000 = AdjMat1[keep.node, keep.node]
datExpr.1000 = datExpr[, keep.node]
datInfo.1000 = datInfo[keep.node, ]

# Compute the GTOMm dissimilarity (m=2 in this example)
dissGTOM2 = GTOMmdist1(
AdjMat1.1000, m=2)

# Hierarchical Clustering
hierGTOM2 = hclust(as.dist(dissGTOM2),method="average")
plot(hierGTOM2, main="GTOM2 Dissimilarity Measure", labels=F, xlab="", sub="")

# To define modules, we choose a height cutoff for the dendrogram
module1 = cutree(hierGTOM2, h=0.5)
image(as.matrix(module1[hierGTOM2$order]), col=1:max(module1))

# Create a Topological Overlap Matrix plot
options(expressions = 10000)
module1_reverse = module1
module1_reverse[hierGTOM2$order[length(module1):1]] = module1[hierGTOM2$order]
heatmap(as.matrix(dissGTOM2), Rowv=as.dendrogram(hierGTOM2), Colv= as.dendrogram(hierGTOM2), scale="none", revC=T, ColSideColors=as.character(module1), RowSideColors=as.character(module1_reverse), labRow=F, labCol=F)

# Create a MDS plot (color by module assignment)
cmdGTOM2 = cmdscale(dissGTOM2,k=2)
plot(cmdGTOM2, col=as.character(module1), xlab="Dimension 1", ylab="Dimension 2")

# Color the MDS plot by gene essentiality (red = essential)
attach(datInfo)
essentiality.1000 = essentiality[keep.node]
plot(cmdGTOM2, col=as.character(essentiality.1000+1), xlab="Dimension 1", ylab="Dimension 2")

# Define the GTOM-based connectivity measure
GTOM2.Conn = apply(1-dissGTOM2, 2, sum)

# Relate GTOM connectivity with gene essentiality using Spearman correlation coefficient
cor(GTOM2.Conn, essentiality.1000, use="p", method="s")
0.1962606


2006-03-07

Please send your suggestions and comments to: shorvath@mednet.ucla.edu