Genome-wide integration
on transcription factors, histone acetylation
and gene expression reveals genes co-regulated by histone
modification patterns
by Yayoi Natsume-Kitatani,
Motoki Shiga and Hiroshi Mamitsuka
This
support page includes the source code files of MATLAB and data resources, which
are necessary to reproduce the results shown in the paper.
To
reproduce the results, follow the instructions below.
1.
Load the resources listed below on to MATLAB.
Source code
Input datasets
genelist_GP.xls: genelist in GSE9217 (dataset GP)
genelist_ES.xls: genelist in GSE9840 (dataset ES)
genelist_TFHM.xls: genelist shared between two ChIP-chip
datasets (dataset TR and AH+)
matrix_GP.txt:
gene expression profile in GSE9217 (dataset GP)
matrix_ES.txt:
gene expression profile in GSE9840 (dataset ES)
matrix_TR.txt:
binding t-CDFs for 1756 genes in dataset TR
matrix_AHplus.txt:
binding t-CDFs for 1756 genes in dataset AH+
2.
Run clustering genes in ChIP-chip data.
% according to TF-binding
W_h=matrix_TR*matrix_TR';
[normvector_h]=normvector(W_h,clsn);
[bestclust_TR] = movmf_m(W_h,normvector_h,clsn,iter,kappa);
clsn: the number of clusters (eg: 10)
iter: the number of iterations (eg: 1000)
kappa: concentration parameter of vMF
distribution (eg: 10)
% according to histone
acetylation
W_k=matrix_AHplus*matrix_AHplus';
[normvector_k]=normvector(W_k,clsn);
[bestclust_AHplus] = movmf_m(W_k,normvector_k,clsn,iter,kappa);
For your reference, the results to
be obtained are included in the following files.
("bestclust_TR.txt"
and "bestclust_AHplus.txt")
3.
Run clustering genes in microarray expression data (GSE9217: matrix_GP).
Run
"SelectCell".
[Bestclust, GeneGroup, GeneGroupList]=SelectCell(matrix_GP, genelist_GP, clsn, iter, kappa, genelist_TFHM, bestclust_TR, bestclust_AHplus);
OUTPUT
Bestclust: cluster IDs
of genes in microarray data
GeneGroup: cluster IDs of pattern-cells with t-values of
more than 0.99
GeneGroupList: genelists of
pattern-cells
Other outputs are 1) the number of genes
in each cell of TF-HM, 2) t-CDFs of genes in each cell and 3) heatmaps of 1).
NOTE:
Cluster
IDs are assigned randomly, which might make the resultant IDs different from
those in the paper. If you run this software on your own gene expression data,
the above parameters need to be replaced with in the followings:
matrix_GP
-> gene expression profile of the microarray dataset
genelist_GP
-> genelist of the dataset
The
above procedure uses datasets previously reported in the following papers:
Harbison, C.T. et al. (2004) Nature, 431, 99-104.
Kurdistani, S.K. et al. (2004) Cell, 117, 721-733.
Bernstein, B.E. et al. (2004) Genome Biol, 5, R62.
Lee, Y.L. and Lee, C.K.
(2008) Mol Cells, 26, 299-307.
Also
the mixture model estimation part uses the source code, being accompanied with
the following paper: