ABSTRACT
Our goal was to explore the accuracy and utility of identifying and removing PCR duplicates from HTS data using Super Deduper. Super Deduper is a pre-alignment, sequence read based technique developed at the University of Idaho, which examines and uses only a small portion of each read's sequence in order to identify and remove PCR and/or optical duplicates. Through comparisons with well-known pre- and post-alignment techniques, Super Deduper's parameters were optimized and its performance assessed. The results conclude that Super Deduper is a viable pre-alignment alternative to post-alignment techniques. Super Deduper is both independent of a reference genome and choice in alignment application, allowing for its use in a greater variety of HTS applications. Super Deduper is an open source application and can be found at https://github.com/dstreett/Super-Deduper.
- Burriesci, M. S., Lehnert, E. M., Pringle, J. R. 2012. Fulcrum: condensing redundant reads from high-throughput sequencing studies. Bioinformatics (Oxford, England), 28(10), 1324--7. Google ScholarDigital Library
- Faust, G. G., Hall, I. M. 2014. SAMBLASTER: fast duplicate marking and structural variant read extraction Bioinformatics, 30(17): 2503--2505.Google Scholar
- Langmead, B., & Salzberg, S. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods, 357--359Google Scholar
- Li, H., Durbin, R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754--1760. Google ScholarDigital Library
- Pireddu, L., Leo, S., Zanetti, G. 2011. SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics, 27(15), 2159--2160. Google ScholarDigital Library
- Ruopp, M. D., Perkins, N. J., Whitcomb, B. W., & Schisterman, E. F. 2008. Youden Index and optimal cut-point estimated from observations affected by a lower limit of detection. Biometrical Journal. Biometrische Zeitschrift, 50(3), 419--30.Google ScholarCross Ref
- Shinzato C., Shoguchi E., Kawashima T., Hamada M., Hisata K., et al. 2011. Using the Acropora digitifera genome to understand coral responses to environmental change. Nature 476: 320--323.Google ScholarCross Ref
- Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J., Prins, P. 2015. Sambamba: fast processing of NGS alignment formats. BioinformaticsGoogle Scholar
- Xu, H., Luo, X., Qian, J., Pang, X., Song, J., et al. 2012. FastUniq: a fast de novo duplicates removal tool for paired short reads. PloS One, 7(12), e52249.Google ScholarCross Ref
Index Terms
- Super deduper, fast PCR duplicate detection in fastq files
Recommendations
AREM: aligning short reads from chip-sequencing by expectation maximization
RECOMB'11: Proceedings of the 15th Annual international conference on Research in computational molecular biologyHigh-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq ...
Implementation of Short Read Alignment Algorithm in OpenCL on Xeon Phi Coprocessor
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and SystemsAligning sequencing reads to a reference genome is often essential in many comparative genomics pipelines. With the maturation of next-generation DNA sequencing (NGS) technologies, an enormous amount of sequence data has been generated, this calls for ...
Escherichia coli detection using the ceramic PCR microsystem: first report
AIASABEBI'11: Proceedings of the 11th WSEAS international conference on Applied informatics and communications, and Proceedings of the 4th WSEAS International conference on Biomedical electronics and biomedical informatics, and Proceedings of the international conference on Computational engineering in systems applicationsEscherichia coli (E.coli) is a common bacteria that colonizes digestive truck of many organisms including humans and is important source of Vitamin K and B12. However, numerous strains of E.coli are pathogenic and cause severe infections and alimentary ...
Comments