Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1015330.1015423acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Bayesian haplo-type inference via the dirichlet process

Published:04 July 2004Publication History

ABSTRACT

The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship. We apply our approach to the analysis of both simulated and real genotype data, and compare to extant methods.

References

  1. Clark, A., et al. (1998). Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. American Journal of Human Genetics, 63, 595--612.Google ScholarGoogle ScholarCross RefCross Ref
  2. Daly, M. J., et al. (2001). High-resolution haplotype structure in the human genome. Nature Genetics, 29(2), 229--232.Google ScholarGoogle ScholarCross RefCross Ref
  3. Escobar, M. D., & West, M. (2002). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577--588.Google ScholarGoogle ScholarCross RefCross Ref
  4. Eskin, E., Halperin, E., & Karp, R. (2003). Efficient reconstruction of haplotype structure via perfect phylogeny. Journal of Bioinformatics and Computational Biology, 1, 1--20.Google ScholarGoogle ScholarCross RefCross Ref
  5. Excoffier, L., & Slatkin, M. (1995). Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution, 12, 921--7.Google ScholarGoogle Scholar
  6. Ferguson, T. S. (1973). A Bayesian analysis of some non-parametric problems. Annals of Statistics, 1, 209--230.Google ScholarGoogle ScholarCross RefCross Ref
  7. Gabriel, S. B., et al. (2002). The structure of haplotype blocks in the human genome. Science, 296, 2225--2229.Google ScholarGoogle ScholarCross RefCross Ref
  8. Greenspan, D., & Geiger, D. (2003). Model-based inference of haplotype block variation. Proceedings of RECOMB 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gusfield, D. (2002). Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. Proceedings of RECOMB 2002 (pp. 166--175). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Halperin, E., & Eskin, E. (2002). Haplotype reconstruction from genotype data using imperfect phylogeny. Technical Report, Columbia University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 90, 161--173.Google ScholarGoogle ScholarCross RefCross Ref
  12. Lauritzen, S. L., & Sheehan, N. A. (2002). Graphical models for genetic analysis. TR R-02-2020, Aalborg University.Google ScholarGoogle Scholar
  13. Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Computational and Graphical Statistics, 9(2), 249--256.Google ScholarGoogle Scholar
  14. Niu, T., Qin, S., Xu, X., & Liu, J. (2002). Bayesian haplo-type inference for multiple linked single nucleotide polymorphisms. American Journal of Human Genetics, 70, 157--169.Google ScholarGoogle ScholarCross RefCross Ref
  15. Patil, N., et al. (2001). Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294, 1719--1723.Google ScholarGoogle ScholarCross RefCross Ref
  16. Risch, N. J. (2000). Searching for genetic determinants in the new millennium. Nature, 405, 847--56.Google ScholarGoogle ScholarCross RefCross Ref
  17. Sachidanandam, R., et al. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 291, 1298--2302.Google ScholarGoogle Scholar
  18. Stephens, M., Smith, N., & Donnelly, P. (2001). A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978--989.Google ScholarGoogle ScholarCross RefCross Ref
  1. Bayesian haplo-type inference via the dirichlet process

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICML '04: Proceedings of the twenty-first international conference on Machine learning
          July 2004
          934 pages
          ISBN:1581138385
          DOI:10.1145/1015330
          • Conference Chair:
          • Carla Brodley

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 July 2004

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate140of548submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader