Statistical Genomics - Research

 

Gene expression array denoising

Gene expression arrays represent a formidable tool, as they allow investigation of thousand of genes at the same time. However, in order to exploit at best their potential, one has to be able to deal successfully with the statistical issue involved in their analysis. We have suggested a de-noising approach based on thresholding. Using a Bayesian hierarchical model and an approach to multiple comparison that is inspired by the False Discovery Rate, we denoise the signal coming from multiple array experiments with the specific goal of identifying the genes that are up-regulated or down-regulated in a given condition. Our model is flexible and can be used for clustering. This project is partially founded by NSF and NASA.

Sabatti, C., S. Karsten, and D. Geschwind (2002) "Thresholding rules for recovering a sparse signal from microarray experiments," Mathematical Biosciences 176: 17-34. Preprint

Erickson, S. and C. Sabatti (2005) "Empirical Bayes estimation of a sparse vector of gene expression," Statistical Applications in Genetics and Molecular Biology, 4 :22.

Genomic scale identification of promoter binding sites

One of the best understood mechanism of transcription regulation is the action of regulatory proteins, that binding on the up-stream region of a gene act either as promoters of suppressors. We have developed a stochastic dictionary model to identify the position of known binding sites on a genomewide scale. We use this information to improve the clustering of array experiments and to reconstruct the regulatory network. Our model organism for these investigations has been E. Coli.
This project is in cooperation with the laboratories of Professor Lange and Liao. It is partially funded by NSF and NASA.

Sabatti, C. and K. Lange (2002) "Genomewide motif identification using a dictionary model," IEEE Proceedings 90: 1803-1810. Preprint

Sabatti, C., L. Rohlin, K. Lange, and J. Liao (2005) "Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites," Bioinformatics 21: 922-931. Preprint

Gene regulation networks

To recover the dynamic behavior of regulatory proteins and their pathway of influence on cell behavior, we combine sequence analysis with results of gene expression array experiments. We develop a sparse hidden component model to link transcription factors activity to gene expression. We use the Vocabulon algorithm to search for binding sites of regulatory proteins in the genome and inform our prior distribution on network structure.
This project is in cooperation with the laboratories of professor Liao and Roychowdhury and with Gareth James. It is partially funded by NSF and NASA.

Sabatti, C., L. Rohlin, M. Oh, and J. Liao. (2002) "Co-expression pattern from DNA microarray experiments as a tool for operon prediction," Nucleic Acid Research 30: 2886-2893. Reprint

Liao, J., R. Boscolo, Y. Yang, L. Tran, C. Sabatti, and V. Roychowdhury (2003) "Network component analysis: reconstruction of regulatory signals in biological systems," Proceedings of the National Academy of Science 100: 15522-15527. Reprint

Kao, K., Y. Yang, R. Boscolo, C. Sabatti, V. Roychowdhury, and J. Liao (2004) "Determination of multiple transcription regulator activities in Escherichia coli using network component analysis," Proceedings of the National Academy of Science 101: 641-646. Reprint

Sabatti, C. and G. James (2006) "Bayesian sparse hidden components analysis for transcription regulation networks," Bioinformatics, 22 : 739-746.

Linkage disequilibrium

I have been interested for a long time in how to measure linkage disequilibrium and in the variations of LD across the genome and across populations. One of the most exciting discoveries of recent investigations on the structure of the genome is the limited haplotype diversity that has been described in terms of haplotype blocks. However, many of the issues involved in the definition of what is a block and how to identify it still need to be resolved. We are interested in developing statistical models that can be used to test a variety of hypothesis on the nature of the blocks and the implication of their existence on gene mapping. We have used a parsing technique combined with ideas from minimum description length principle to identify high frequency haplotypes in the genome. (This project - funded by NIH - is in cooperation with the laboratory of professor Lange.)
I also have a long standing interest in the definition of measures of disequilibrium. I have suggested the use of homozygosity statistics and proposed standardizations based on the idea of volume tests. Recently, in collaboration with Yuguo Chen we have described Sequential Importance Sampling algorithms that allow fast evaluation of volume measures. We analyzed the LD stucture of fine mapping data from different population isolates collected in the laboratory of Nelson Freimer.

Sabatti, C. and N. Risch (2002) "Homozygosity and linkage disequilibrium," Genetics 160: 1707-1719. Preprint

Sabatti, C. (2002) "Measuring dependence with volume tests," The American Statistician 50: 191-195. Preprint

Ayers, K., C. Sabatti, and K. Lange (2006) "Reconstructing ancestral haplotypes with a dictionary model," Journal of Computational Biology, 3, 3: 767-785.

Wang, H., C. Lin, S. Service, The international collaborative group on isolated populations, Y. Chen, N. Freimer, C. Sabatti (2006) "Linkage disequilibrium and haplotype homozygosity in population samples genotyped at a high marker density," Human Heredity , 62 : 175-189.

Chen, Y., C. Lin, C. Sabatti (2006) "Volume measures for linkage disequilibrium," BMC Genetics 7:54

Association mapping

We are generally interested in association mapping. I have contributed to develop a Bayesian method for haplotype mapping, and have been quite interested in the problems of multiple comparison in association genomescans. Furthermore, I am interested in how association mapping can be combined with multiple phenotype analysis, and studies of population structures. We have NIH funding for these projects, in cooperation with the laboratories of Professor Freimer.

Liu, J., C. Sabatti, J. Teng, B. Keats, and N. Risch (2001) "Bayesian analysis of haplotypes for linkage disequilibrium mapping," Genome Research 11: 1716-24. Preprint

Sabatti, C., S. Service, and N. Freimer (2003) "False discovery rates in linkage and association linkage genome screens for complex disorders," Genetics 164: 829-833. Reprint

Freimer, N. and C. Sabatti (2003) "The human phenome project," Nature Genetics 34: 15-21. Reprint

Freimer, N. and C. Sabatti (2004) "Pedigree, sib-pair, and association studies of common diseases; genetic mapping and epidemiology," Nature Genetics 36: 1045-1051. Reprint

Sabatti, C. (2006) "Comment on the `Likelihood-Based Inference on haplotype effects in genetic association studies' by Lin and Zeng," Journal of the American Statistical Association 101: 104-106. (Invited contribution.)

Service, S., The international collaborative group on isolated populations, C. Sabatti, N. Freimer (2007) "Tag SNPs chosen from HapMap perform well in several population isolates," Genetic Epidemiology, Epub ahead of print.

Freimer, N. and C. Sabatti (2007) "Human genetics: variants in common diseases." Nature 445: 828-30. (Invited contribution.)

Ayers, K., C. Sabatti and K. Lange (2007) "A dictionary model for haplotyping, genotype calling, and association mapping" Genetic Epidemiology 31 : 672-683. Currently we are investigating genetic epidemiology in the Northern Finland Birth Cohort

High density SNP genotyping

We are developing models for intensity values of the Affymetrix and Illumina genotyping arrays to be used in genotype calls, linkage studies, and loss of heterozygosity studies. We have NIH funding for these projects, in cooperation with Professors Ken Lange, Stan Nelson, and Roel Ophoff.

Sabatti, C. and K. Lange (2005) "Bayesian Gaussian mixture models for high density genotyping arrays," UCLA Stat preprint 421, to appear in JASA.

Wang, H., Y. Lee, S. Nelson, and C. Sabatti (2005) "Inferring genomic loss and location of tumor suppressor genes from high density genotypes," UCLA Stat preprint 423, Journal of the French Statistical Society, 146: 153-171.

Wang, H., C. Lin, S. Service, The international collaborative group on isolated populations, Y. Chen, N. Freimer, C. Sabatti (2006) "Linkage disequilibrium and haplotype homozygosity in population samples genotyped at a high marker density," Human Heredity , 62 : 175-189.

Veldink,J., H. Wang, R. Ophoff, C. Sabatti (2008) "Detecting copy number variation using Illumina genotyping technology." UCLA Stat Preprint 533

High Throughput Screens

In collaboration with Koppany Visnyei and Harley Kornblum we are developing methods for the analysis of high throughput screen data.

Sabatti, C., K. Visnyei, H. Kornblum (2008) "Statistical challenges in High-throughput Screens." UCLA Stat Preprint 532

.
Welcome Projects People Publications Software Education Outreach Resources