Leveraging Similar Regions to Improve Genome Data Processing
Kristal Curtis
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2015-199
September 15, 2015
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-199.pdf
Though DNA sequencing has improved dramatically over the past decade, variant calling, which is the process of reconstructing a patient’s genome from the reads that the sequencers produce, remains a difficult problem, largely due to the genome’s redundant structure. In this thesis, we describe SiRen, our algorithm for characterizing the genome’s structure in a way that makes sense from the perspective of the reads themselves. We use the term similar regions to refer to the areas of redundancy that we have identified. We then confirm that the similar regions are characterized by low variant calling accuracy. We show that the structure of the similar regions provides a platform for repairing alignment errors, thus leading to significantly improved variant calling accuracy.
Advisors: David A. Patterson and Armando Fox
BibTeX citation:
@phdthesis{Curtis:EECS-2015-199, Author= {Curtis, Kristal}, Title= {Leveraging Similar Regions to Improve Genome Data Processing}, School= {EECS Department, University of California, Berkeley}, Year= {2015}, Month= {Sep}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-199.html}, Number= {UCB/EECS-2015-199}, Abstract= {Though DNA sequencing has improved dramatically over the past decade, variant calling, which is the process of reconstructing a patient’s genome from the reads that the sequencers produce, remains a difficult problem, largely due to the genome’s redundant structure. In this thesis, we describe SiRen, our algorithm for characterizing the genome’s structure in a way that makes sense from the perspective of the reads themselves. We use the term similar regions to refer to the areas of redundancy that we have identified. We then confirm that the similar regions are characterized by low variant calling accuracy. We show that the structure of the similar regions provides a platform for repairing alignment errors, thus leading to significantly improved variant calling accuracy.}, }
EndNote citation:
%0 Thesis %A Curtis, Kristal %T Leveraging Similar Regions to Improve Genome Data Processing %I EECS Department, University of California, Berkeley %D 2015 %8 September 15 %@ UCB/EECS-2015-199 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-199.html %F Curtis:EECS-2015-199