|
|
Gene expression array denoising
|
Gene expression arrays represent a formidable tool, as they allow
investigation of thousand of genes at the same time. However, in order
to exploit at best their potential, one has to be able to deal successfully with the statistical issue involved in their analysis.
We have suggested a de-noising approach based on thresholding.
Using a Bayesian hierarchical model and an approach to multiple
comparison that is inspired by the False Discovery Rate, we denoise the signal coming from multiple array experiments with the specific goal of identifying the genes that are up-regulated or down-regulated in a given condition.
Our model is flexible and can be used for clustering.
This project is partially founded by NSF and NASA.
Sabatti, C., S. Karsten, and D. Geschwind (2002) "Thresholding rules for recovering a sparse signal from microarray
experiments,"
Mathematical Biosciences 176: 17-34. Preprint
Erickson, S. and C. Sabatti (2005) "Empirical Bayes estimation of a sparse vector of gene expression," Statistical Applications in
Genetics and Molecular Biology, 4 :22.
|
Genomic scale identification of promoter binding sites
|
One of the best understood mechanism of transcription regulation is the action of regulatory proteins, that binding on the up-stream region of a gene act either as promoters of suppressors.
We have developed a stochastic dictionary model to identify the position of known binding sites on a genomewide scale. We use this information to improve the clustering of array experiments and to reconstruct the regulatory network.
Our model organism for these investigations has been E. Coli.
This project is in cooperation with the laboratories of Professor Lange and Liao.
It is partially funded by NSF and NASA.
Sabatti, C. and K. Lange (2002) "Genomewide motif identification using a dictionary model," IEEE Proceedings 90: 1803-1810. Preprint
Sabatti, C., L. Rohlin, K. Lange, and J. Liao (2005) "Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites," Bioinformatics 21: 922-931. Preprint
|
Gene regulation networks
|
To recover the dynamic behavior of regulatory proteins and their
pathway of influence on cell behavior, we combine
sequence analysis with results of gene expression array
experiments.
We develop a sparse hidden component model to link transcription
factors activity to gene expression. We use the Vocabulon algorithm to
search for binding sites of regulatory proteins in the genome and
inform our prior distribution on network structure.
This project is in cooperation with the laboratories of professor Liao
and Roychowdhury and with Gareth James.
It is partially funded by NSF and NASA.
Sabatti, C., L. Rohlin, M. Oh, and J. Liao. (2002) "Co-expression pattern from DNA microarray experiments as a tool for operon prediction,"
Nucleic Acid Research 30: 2886-2893. Reprint
Liao, J., R. Boscolo, Y. Yang, L. Tran, C. Sabatti, and
V. Roychowdhury (2003) "Network component analysis: reconstruction of
regulatory signals in biological systems," Proceedings of the
National Academy of Science 100: 15522-15527. Reprint
Kao, K., Y. Yang, R. Boscolo, C. Sabatti, V. Roychowdhury, and J. Liao (2004) "Determination of multiple transcription regulator activities in Escherichia coli using network component analysis," Proceedings of the National Academy of Science 101: 641-646. Reprint
Sabatti, C. and G. James (2006)
"Bayesian sparse hidden components analysis for transcription regulation networks,"
Bioinformatics, 22 : 739-746.
|
Linkage disequilibrium
|
I have been interested for a long time in how to measure linkage
disequilibrium and in the variations of LD across the genome and
across populations.
One of the most exciting discoveries of recent investigations on the structure of the genome is the limited haplotype diversity that has been described in terms of haplotype blocks. However, many of the issues involved in the definition of what is a block and how to identify it still need to be resolved.
We are interested in developing statistical models that can be used to test a variety of hypothesis on the nature of the blocks and the implication of their existence on gene mapping.
We have used a parsing technique combined with ideas from
minimum description length principle to identify high frequency haplotypes in the genome.
(This project - funded by NIH - is in cooperation with the laboratory
of professor Lange.)
I also have a long standing interest in the definition of measures of
disequilibrium. I have suggested the use of homozygosity statistics
and proposed standardizations based on the idea of volume
tests. Recently, in collaboration with Yuguo Chen we have described
Sequential Importance Sampling algorithms that allow fast evaluation
of volume measures.
We analyzed the LD stucture of fine
mapping data from different population isolates collected in the
laboratory of Nelson Freimer.
Sabatti, C. and N. Risch (2002) "Homozygosity and linkage
disequilibrium," Genetics 160: 1707-1719. Preprint
Sabatti, C. (2002) "Measuring dependence with volume tests," The American Statistician
50: 191-195. Preprint
Ayers, K., C. Sabatti, and K. Lange (2006)
"Reconstructing ancestral haplotypes with a dictionary model,"
Journal of Computational Biology, 3, 3: 767-785.
Wang, H., C. Lin, S. Service, The international collaborative group on isolated populations, Y. Chen, N. Freimer, C. Sabatti (2006)
"Linkage disequilibrium and haplotype homozygosity in population
samples genotyped at a high marker density," Human Heredity ,
62 : 175-189.
Chen, Y., C. Lin, C. Sabatti (2006) "Volume measures for linkage
disequilibrium," BMC Genetics
7:54
|
Association mapping
|
We are generally interested in association mapping.
I have contributed to develop
a Bayesian method for haplotype mapping, and have been quite
interested in the problems of multiple comparison in association genomescans.
Furthermore, I am interested in how association mapping can be
combined with multiple phenotype analysis, and studies of population structures.
We have NIH funding for these projects, in cooperation with the laboratories of Professor Freimer.
Liu, J., C. Sabatti, J. Teng, B. Keats, and N. Risch (2001) "Bayesian analysis of haplotypes for linkage disequilibrium mapping," Genome Research 11: 1716-24. Preprint
Sabatti, C., S. Service, and N. Freimer (2003) "False discovery rates in linkage and association linkage genome screens for complex disorders," Genetics 164: 829-833. Reprint
Freimer, N. and C. Sabatti (2003) "The human phenome project,"
Nature Genetics 34: 15-21.
Reprint
Freimer, N. and C. Sabatti (2004) "Pedigree, sib-pair, and
association studies of common diseases; genetic mapping and
epidemiology," Nature Genetics 36:
1045-1051.
Reprint
Sabatti, C. (2006) "Comment on the `Likelihood-Based
Inference on haplotype effects in genetic association studies' by Lin
and Zeng," Journal of the American Statistical
Association 101: 104-106. (Invited contribution.)
Service, S., The international collaborative group on isolated
populations, C. Sabatti, N. Freimer (2007)
"Tag SNPs chosen from HapMap perform well in several population
isolates," Genetic Epidemiology, Epub ahead of print.
Freimer, N. and C. Sabatti (2007) "Human genetics: variants
in common diseases." Nature 445: 828-30. (Invited contribution.)
Ayers, K., C. Sabatti and K. Lange (2007) "A dictionary model for
haplotyping, genotype calling, and association mapping"
Genetic Epidemiology 31 : 672-683.
Currently we are investigating genetic epidemiology in the Northern Finland Birth Cohort
|
High density SNP genotyping
|
We are developing models for intensity values of the Affymetrix and
Illumina genotyping arrays to be used in genotype calls, linkage studies, and loss of heterozygosity studies.
We have NIH funding for these projects, in cooperation with Professors
Ken Lange, Stan Nelson, and Roel Ophoff.
Sabatti, C. and K. Lange (2005) "Bayesian Gaussian mixture models for high density genotyping arrays," UCLA
Stat preprint 421,
to appear in JASA.
Wang, H., Y. Lee, S. Nelson, and C. Sabatti (2005) "Inferring genomic loss and location of tumor suppressor genes from high density genotypes," UCLA
Stat preprint 423,
Journal of the French Statistical Society, 146:
153-171.
Wang, H., C. Lin, S. Service, The international collaborative group on isolated populations, Y. Chen, N. Freimer, C. Sabatti (2006)
"Linkage disequilibrium and haplotype homozygosity in population
samples genotyped at a high marker density," Human Heredity ,
62 : 175-189.
Veldink,J., H. Wang, R. Ophoff, C. Sabatti (2008) "Detecting copy
number variation using Illumina genotyping technology." UCLA
Stat Preprint 533
|
High Throughput Screens
|
In collaboration with Koppany Visnyei and Harley Kornblum we are
developing methods for the analysis of high throughput screen data.
Sabatti, C., K. Visnyei, H. Kornblum (2008) "Statistical
challenges in High-throughput Screens." UCLA
Stat Preprint 532
|
|