ABSTRACT
The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship. We apply our approach to the analysis of both simulated and real genotype data, and compare to extant methods.
- Clark, A., et al. (1998). Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. American Journal of Human Genetics, 63, 595--612.Google ScholarCross Ref
- Daly, M. J., et al. (2001). High-resolution haplotype structure in the human genome. Nature Genetics, 29(2), 229--232.Google ScholarCross Ref
- Escobar, M. D., & West, M. (2002). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577--588.Google ScholarCross Ref
- Eskin, E., Halperin, E., & Karp, R. (2003). Efficient reconstruction of haplotype structure via perfect phylogeny. Journal of Bioinformatics and Computational Biology, 1, 1--20.Google ScholarCross Ref
- Excoffier, L., & Slatkin, M. (1995). Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution, 12, 921--7.Google Scholar
- Ferguson, T. S. (1973). A Bayesian analysis of some non-parametric problems. Annals of Statistics, 1, 209--230.Google ScholarCross Ref
- Gabriel, S. B., et al. (2002). The structure of haplotype blocks in the human genome. Science, 296, 2225--2229.Google ScholarCross Ref
- Greenspan, D., & Geiger, D. (2003). Model-based inference of haplotype block variation. Proceedings of RECOMB 2003. Google ScholarDigital Library
- Gusfield, D. (2002). Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. Proceedings of RECOMB 2002 (pp. 166--175). Google ScholarDigital Library
- Halperin, E., & Eskin, E. (2002). Haplotype reconstruction from genotype data using imperfect phylogeny. Technical Report, Columbia University. Google ScholarDigital Library
- Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 90, 161--173.Google ScholarCross Ref
- Lauritzen, S. L., & Sheehan, N. A. (2002). Graphical models for genetic analysis. TR R-02-2020, Aalborg University.Google Scholar
- Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Computational and Graphical Statistics, 9(2), 249--256.Google Scholar
- Niu, T., Qin, S., Xu, X., & Liu, J. (2002). Bayesian haplo-type inference for multiple linked single nucleotide polymorphisms. American Journal of Human Genetics, 70, 157--169.Google ScholarCross Ref
- Patil, N., et al. (2001). Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294, 1719--1723.Google ScholarCross Ref
- Risch, N. J. (2000). Searching for genetic determinants in the new millennium. Nature, 405, 847--56.Google ScholarCross Ref
- Sachidanandam, R., et al. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 291, 1298--2302.Google Scholar
- Stephens, M., Smith, N., & Donnelly, P. (2001). A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978--989.Google ScholarCross Ref
- Bayesian haplo-type inference via the dirichlet process
Recommendations
Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture
ICML '06: Proceedings of the 23rd international conference on Machine learningUncovering the haplotypes of single nucleotide polymorphisms and their population demography is essential for many biological and medical applications. Methods for haplotype inference developed thus far---including methods based on coalescence, finite ...
Practical collapsed variational bayes inference for hierarchical dirichlet process
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningWe propose a novel collapsed variational Bayes (CVB) inference for the hierarchical Dirichlet process (HDP). While the existing CVB inference for the HDP variant of latent Dirichlet allocation (LDA) is more complicated and harder to implement than that ...
Variational Bayesian Inference for Infinite Dirichlet Mixture Towards Accurate Data Categorization
In this paper, we focus on a variational Bayesian learning approach to infinite Dirichlet mixture model (VarInDMM) which inherits the confirmed effectiveness of modeling proportional data from infinite Dirichlet mixture model. Based on the Dirichlet ...
Comments