ABSTRACT
Genome-wide association studies (GWASs) are widely used to investigate statistically significant associations between diseases and single nucleotide polymorphisms (SNPs) to identify causal factors of diseases. In GWAS, statistical significance of more than one million SNPs have been recently assessed, but in many case, no associations are found because of the application of conservative multiple testing corrections, such as Bonferroni correction. While more sensitive methods, such as Westfall-Young permutation procedure (WY), would relate more SNPs with diseases, its extremely long computational time has prohibited from the application of WY to GWAS. We introduce an algorithm to accelerate WY, named High-speed Westfall-Young permutation procedure (HWY). HWY utilizes three techniques to make WY computationally practical. First, P-value calculations for SNPs that cannot affect the adjusted significance level are pruned. Second, a lookup table of P-values is used to avoid frequent duplicate calculations. Finally, computations are parallelized using a GPGPU. HWY was 619 times faster than WY and more than 122 times faster than PLINK, a widely used GWAS software, and analyzed a dataset contained one million SNPs and one thousand individuals in approximately two hours. Re-analysis of existing GWAS datasets with HWY may uncover additional hidden SNP-trait associations.
- S. Atwell, Y. S. Huang, B. J. Vilhjálmsson, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature, 465(7298):627--631, 2010.Google ScholarCross Ref
- Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B, 57(1):289--300, 1995.Google ScholarCross Ref
- Y. Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testing under dependency. Ann Stat., 29(4):1165--1188, 2001.Google ScholarCross Ref
- C. E. Bonferroni. Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8:3--62, 1936.Google Scholar
- V. G. Cheung, R. S. Spielman, K. G. Ewens, et al. Mapping determinants of human gene expression by regional and genome-wide association. Nature, 437(7063):1365--1369, 2005.Google ScholarCross Ref
- Y. Ge, S. Dudoit, and T. P. Speed. Resampling-based multiple testing for microarray data analysis. Test, 12(1):1--77, 2003.Google ScholarCross Ref
- L. A. Hindorff, P. Sethupathy, H. A. Junkins, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. PNAS, 106(23):9362--9367, 2009.Google ScholarCross Ref
- S. Holm. A simple sequentially rejective multiple test procedure. Scand J Stat., 6(2):65--70, 1979.Google Scholar
- X. Huang and B. Han. Natural variations and genome-wide association studies in crop plants. Annu Rev Plant Biol, 65:531--551, 2014.Google ScholarCross Ref
- M. I. McCarthy, G. R. Abecasis, and L. R. o. Cardon. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics, 9(5):356--369, 2008.Google ScholarCross Ref
- N. Meinshausen, M. H. Maathuis, and P. Bühlmann. Asymptotic optimality of the WestfallâĂŞ Young permutation procedure for multiple testing under dependence. Ann Stat., 39(6):3369--3391, 2011.Google ScholarCross Ref
- S. Purcell, B. Neale, K. Todd-Brown, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics, 81(3):559--75, 2007.Google Scholar
- G. D. Ruxton and M. Neuhäuser. Good practice in testing for an association in contingency tables. Behavioral Ecology and Sociobiology, 64(9):1505--1513, 2010.Google ScholarCross Ref
- A. Terada, M. Okada-Hatakeyama, K. Tsuda, et al. Statistical significance of combinatorial regulations. Proc Natl Acad Sci USA., 110(32):12996--13001, 2013.Google ScholarCross Ref
- A. Terada, K. Tsuda, and J. Sese. Fast Westfall-Young Permutation Procedure for Combinatorial Regulation Discovery. In IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2013.Google Scholar
- The International HapMap Consortium. A haplotype map of the human genome. Nature, 437(7063):1299--320, 2005.Google ScholarCross Ref
- Z. Šidák. Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc., 62(318):626--633, 1967.Google Scholar
- J. A. Webster, J. R. Gibbs, J. Clarke, et al. Genetic control of human brain transcript expression in Alzheimer disease. American journal of human genetics, 84(4):445--58, 2009.Google Scholar
- D. Welter, J. MacArthur, J. Morales, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research, 42(Database issue):D1001--6, 2014.Google Scholar
- P. H. Westfall and S. S. Young. Resampling-based multiple testing: Examples and methods for p-value adjustment. Wiley, New York, 1993.Google Scholar
- J. Winkelmann, B. Schormair, P. Lichtner, et al. Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions. Nat Genet., 39(8):1000--1006, 2007.Google ScholarCross Ref
- G. Yang, W. Jiang, Q. Yang, et al. PBOOST: A GPU based tool for parallel permutation tests in genome-wide association studies. Bioinformatics, 2014.Google Scholar
- X. Zhang, F. Zou, and W. Wang. FastChi: an efficient algorithm for analyzing gene-gene interactions. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pages 528--39, 2009.Google Scholar
Index Terms
- High-speed westfall-young permutation procedure for genome-wide association studies
Recommendations
Efficient Algorithms for the Two Locus Problem in Genome-Wide Association Study: Algorithms for the Two Locus Problem
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementAdvances made in sequencing technology have resulted in the sequencing of thousands of genomes. Novel analysis tools are needed to process these data and extract useful information. Such tools could aid in personalized medicine. As an example, we could ...
A Novel Method to Select High-risk Disease-Related Regions after a Genome Wide Haplotype-Based Association Study: An Application to Alcoholism
FSKD '09: Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 05Genome-wide association (GWA) studies based on haplo type have emerged as a new and powerful approach to identify the genetic variants involved in human complex diseases. A challenging problem after a GWA study based on haplotype is to select high-risk ...
Effects of input data quantity on genome-wide association studies (GWAS)
Many software packages have been developed for Genome-Wide Association Studies (GWAS) based on various statistical models. One key factor influencing the statistical reliability of GWAS is the amount of input data used. In this paper, we investigate how ...
Comments