Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2808719.2808767acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

The impact of RNA-seq aligners on gene expression estimation

Authors Info & Claims
Published:09 September 2015Publication History

ABSTRACT

While numerous RNA-seq data analysis pipelines are available, research has shown that the choice of pipeline influences the results of differentially expressed gene detection and gene expression estimation. Gene expression estimation is a key step in RNA-seq data analysis, since the accuracy of gene expression estimates profoundly affects the subsequent analysis. Generally, gene expression estimation involves sequence alignment and quantification, and accurate gene expression estimation requires accurate alignment. However, the impact of aligners on gene expression estimation remains unclear. We address this need by constructing nine pipelines consisting of nine spliced aligners and one quantifier. We then use simulated data to investigate the impact of aligners on gene expression estimation. To evaluate alignment, we introduce three alignment performance metrics, (1) the percentage of reads aligned, (2) the percentage of reads aligned with zero mismatch (ZeroMismatchPercentage), and (3) the percentage of reads aligned with at most one mismatch (ZeroOneMismatchPercentage). We then evaluate the impact of alignment performance on gene expression estimation using three metrics, (1) gene detection accuracy, (2) the number of genes falsely quantified (FalseExpNum), and (3) the number of genes with falsely estimated fold changes (FalseFcNum). We found that among various pipelines, FalseExpNum and FalseFcNum are correlated. Moreover, FalseExpNum is linearly correlated with the percentage of reads aligned and ZeroMismatchPercentage, and FalseFcNum is linearly correlated with ZeroMismatchPercentage. Because of this correlation, the percentage of reads aligned and ZeroMismatchPercentage may be used to assess the performance of gene expression estimation for all RNA-seq datasets.

References

  1. Z. Wang, M. Gerstein, and M. Snyder, "RNA-Seq: a revolutionary tool for transcriptomics," Nature Reviews Genetics, vol. 10, pp. 57--63, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. Trapnell, A. Roberts, L. Goff, G. Pertea, D. Kim, D. R. Kelley, et al., "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks," Nature protocols, vol. 7, pp. 562--578, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. C. Marioni, C. E. Mason, S. M. Mane, M. Stephens, and Y. Gilad, "RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays," Genome research, vol. 18, pp. 1509--1517, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  4. Z. Peng, Y. Cheng, B. C.-M. Tan, L. Kang, Z. Tian, Y. Zhu, et al., "Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome," Nature biotechnology, vol. 30, pp. 253--260, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Oshlack, M. D. Robinson, and M. D. Young, "From RNA-seq reads to differential expression results," Genome biol, vol. 11, p. 220, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  6. O. D. Iancu, S. Kawane, D. Bottomly, R. Searles, R. Hitzemann, and S. McWeeney, "Utilizing RNA-Seq data for de novo coexpression network inference," Bioinformatics, vol. 28, pp. 1592--1597, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. M.-I. Consortium, "A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium," Nat Biotech, vol. advance online publication, 08/24/online 2014.Google ScholarGoogle Scholar
  8. N. A. Fonseca, J. Marioni, and A. Brazma, "RNA-seq gene profiling-a systematic empirical comparison," PloS one, vol. 9, p. e107026, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Soneson and M. Delorenzi, "A comparison of methods for differential expression analysis of RNA-seq data," BMC bioinformatics, vol. 14, p. 91, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  10. H. Li and N. Homer, "A survey of sequence alignment algorithms for next-generation sequencing," Briefings in bioinformatics, vol. 11, pp. 473--483, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Chandramohan, P.-Y. Wu, J. H. Phan, and M. D. Wang, "Systematic Assessment of RNA-Seq Quantification Tools Using Simulated Sequence Data," in Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, 2013, p. 623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. A. Robles, S. E. Qureshi, S. J. Stephen, S. R. Wilson, C. J. Burden, and J. M. Taylor, "Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing," BMC genomics, vol. 13, p. 484, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  13. G. R. Grant, M. H. Farkas, A. D. Pizarro, N. F. Lahens, J. Schug, B. P. Brunk, et al., "Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)," Bioinformatics, vol. 27, pp. 2518--28, Sep 15 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. G. Engström, T. Steijger, B. Sipos, G. R. Grant, A. Kahles, G. Rätsch, et al., "Systematic evaluation of spliced alignment programs for RNA-seq data," Nature methods, vol. 10, pp. 1185--1191, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Anders, P. T. Pyl, and W. Huber, "HTSeq--A Python framework to work with high-throughput sequencing data," Bioinformatics, p. btu638, 2014.Google ScholarGoogle Scholar
  16. B. Sipos, G. Slodkowicz, T. Massingham, and N. Goldman, "Realistic simulations reveal extensive sample-specificity of RNA-seq biases," arXiv preprint arXiv:1308.3172, 2013.Google ScholarGoogle Scholar
  17. T. Massingham, "simNGS -- software for simulating next-generation sequencing data, http://www.ebi.ac.uk/goldman-srv/simNGS/," 2012.Google ScholarGoogle Scholar
  18. X. Zheng and E. N. Moriyama, "Comparative studies of differential gene calling using RNA-Seq data," BMC bioinformatics, vol. 14, p. S7, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. C. Munger, N. Raghupathy, K. Choi, A. K. Simons, D. M. Gatti, D. A. Hinerfeld, et al., "Rna-seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations," Genetics, vol. 198, pp. 59--73, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  20. D. Kim, G. Pertea, C. Trapnell, H. Pimentel, R. Kelley, and S. L. Salzberg, "TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions," Genome Biol, vol. 14, p. R36, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, et al., "STAR: ultrafast universal RNA-seq aligner," Bioinformatics, vol. 29, pp. 15--21, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Wang, D. Singh, Z. Zeng, S. J. Coleman, Y. Huang, G. L. Savich, et al., "MapSplice: accurate mapping of RNA-seq reads for splice junction discovery," Nucleic acids research, p. gkq622, 2010.Google ScholarGoogle Scholar
  23. T. D. Wu and S. Nacu, "Fast and SNP-tolerant detection of complex variants and splicing in short reads," Bioinformatics, vol. 26, pp. 873--881, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Zhang, E.-W. Lameijer, P. AC't Hoen, Z. Ning, P. E. Slagboom, and K. Ye, "PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data," Bioinformatics, vol. 28, pp. 479--486, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Wu, O. Anczuków, A. R. Krainer, M. Q. Zhang, and C. Zhang, "OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds," Nucleic acids research, vol. 41, pp. 5149--5163, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  26. Y. Liao, G. K. Smyth, and W. Shi, "The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote," Nucleic acids research, vol. 41, pp. e108--e108, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  27. S. Huang, J. Zhang, R. Li, W. Zhang, Z. He, T.-W. Lam, et al., "SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data," Frontiers in genetics, vol. 2, 2011.Google ScholarGoogle Scholar
  28. S. Marco-Sola, M. Sammeth, R. Guigó, and P. Ribeca, "The GEM mapper: fast, accurate and versatile alignment by filtration," Nature methods, vol. 9, pp. 1185--1188, 2012.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The impact of RNA-seq aligners on gene expression estimation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
          September 2015
          683 pages
          ISBN:9781450338530
          DOI:10.1145/2808719

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 September 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          BCB '15 Paper Acceptance Rate48of141submissions,34%Overall Acceptance Rate254of885submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader