Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2808719.2811568acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

Super deduper, fast PCR duplicate detection in fastq files

Published:09 September 2015Publication History

ABSTRACT

Our goal was to explore the accuracy and utility of identifying and removing PCR duplicates from HTS data using Super Deduper. Super Deduper is a pre-alignment, sequence read based technique developed at the University of Idaho, which examines and uses only a small portion of each read's sequence in order to identify and remove PCR and/or optical duplicates. Through comparisons with well-known pre- and post-alignment techniques, Super Deduper's parameters were optimized and its performance assessed. The results conclude that Super Deduper is a viable pre-alignment alternative to post-alignment techniques. Super Deduper is both independent of a reference genome and choice in alignment application, allowing for its use in a greater variety of HTS applications. Super Deduper is an open source application and can be found at https://github.com/dstreett/Super-Deduper.

References

  1. Burriesci, M. S., Lehnert, E. M., Pringle, J. R. 2012. Fulcrum: condensing redundant reads from high-throughput sequencing studies. Bioinformatics (Oxford, England), 28(10), 1324--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Faust, G. G., Hall, I. M. 2014. SAMBLASTER: fast duplicate marking and structural variant read extraction Bioinformatics, 30(17): 2503--2505.Google ScholarGoogle Scholar
  3. Langmead, B., & Salzberg, S. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods, 357--359Google ScholarGoogle Scholar
  4. Li, H., Durbin, R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754--1760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Pireddu, L., Leo, S., Zanetti, G. 2011. SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics, 27(15), 2159--2160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ruopp, M. D., Perkins, N. J., Whitcomb, B. W., & Schisterman, E. F. 2008. Youden Index and optimal cut-point estimated from observations affected by a lower limit of detection. Biometrical Journal. Biometrische Zeitschrift, 50(3), 419--30.Google ScholarGoogle ScholarCross RefCross Ref
  7. Shinzato C., Shoguchi E., Kawashima T., Hamada M., Hisata K., et al. 2011. Using the Acropora digitifera genome to understand coral responses to environmental change. Nature 476: 320--323.Google ScholarGoogle ScholarCross RefCross Ref
  8. Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J., Prins, P. 2015. Sambamba: fast processing of NGS alignment formats. BioinformaticsGoogle ScholarGoogle Scholar
  9. Xu, H., Luo, X., Qian, J., Pang, X., Song, J., et al. 2012. FastUniq: a fast de novo duplicates removal tool for paired short reads. PloS One, 7(12), e52249.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Super deduper, fast PCR duplicate detection in fastq files

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
          September 2015
          683 pages
          ISBN:9781450338530
          DOI:10.1145/2808719

          Copyright © 2015 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 September 2015

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          BCB '15 Paper Acceptance Rate48of141submissions,34%Overall Acceptance Rate254of885submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader