Preliminary Studies on de novo Assembly with Short Reads
Nanheng Wu
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2009-172
December 15, 2009
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-172.pdf
Recent development of next generation sequencing presents new computational challenges to assembly algorithms. Any effective and practical de novo assembly algorithm must confront issues of short read length, base-calling errors and enormous data size. In this report we present our effort to address these challenges in de novo assembly with short reads. Specifically we show that quality scores contain vital information and algorithms can achieve optimized results if they utilize quality scores. We also show that error correction preprocessing can be used to enhance de novo assembly algorithms with more tolerance to base-calling errors. Finally we present a novel parallel algorithm to cluster sequence reads based on overlap information and show that it has the potential to scale up to handling millions of reads efficiently.
Advisors: Satish Rao
BibTeX citation:
@mastersthesis{Wu:EECS-2009-172, Author= {Wu, Nanheng}, Editor= {Rao, Satish and Song, Yun S.}, Title= {Preliminary Studies on de novo Assembly with Short Reads}, School= {EECS Department, University of California, Berkeley}, Year= {2009}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-172.html}, Number= {UCB/EECS-2009-172}, Abstract= {Recent development of next generation sequencing presents new computational challenges to assembly algorithms. Any effective and practical de novo assembly algorithm must confront issues of short read length, base-calling errors and enormous data size. In this report we present our effort to address these challenges in de novo assembly with short reads. Specifically we show that quality scores contain vital information and algorithms can achieve optimized results if they utilize quality scores. We also show that error correction preprocessing can be used to enhance de novo assembly algorithms with more tolerance to base-calling errors. Finally we present a novel parallel algorithm to cluster sequence reads based on overlap information and show that it has the potential to scale up to handling millions of reads efficiently.}, }
EndNote citation:
%0 Thesis %A Wu, Nanheng %E Rao, Satish %E Song, Yun S. %T Preliminary Studies on de novo Assembly with Short Reads %I EECS Department, University of California, Berkeley %D 2009 %8 December 15 %@ UCB/EECS-2009-172 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-172.html %F Wu:EECS-2009-172