Leveraging Similar Regions to Improve Genome Data Processing

Kristal Curtis

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2015-199
September 15, 2015

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-199.pdf

Though DNA sequencing has improved dramatically over the past decade, variant calling, which is the process of reconstructing a patient’s genome from the reads that the sequencers produce, remains a difficult problem, largely due to the genome’s redundant structure. In this thesis, we describe SiRen, our algorithm for characterizing the genome’s structure in a way that makes sense from the perspective of the reads themselves. We use the term similar regions to refer to the areas of redundancy that we have identified. We then confirm that the similar regions are characterized by low variant calling accuracy. We show that the structure of the similar regions provides a platform for repairing alignment errors, thus leading to significantly improved variant calling accuracy.

Advisor: David A. Patterson and Armando Fox


BibTeX citation:

@phdthesis{Curtis:EECS-2015-199,
    Author = {Curtis, Kristal},
    Title = {Leveraging Similar Regions to Improve Genome Data Processing},
    School = {EECS Department, University of California, Berkeley},
    Year = {2015},
    Month = {Sep},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-199.html},
    Number = {UCB/EECS-2015-199},
    Abstract = {Though DNA sequencing has improved dramatically over the past decade, variant calling, which is the process of reconstructing a patient’s genome from the reads that the sequencers produce, remains a difficult problem, largely due to the genome’s redundant structure. In this thesis, we describe SiRen, our algorithm for characterizing the genome’s structure in a way that makes sense from the perspective of the reads themselves. We use the term similar regions to refer to the areas of redundancy that we have identified. We then confirm that the similar regions are characterized by low variant calling accuracy. We show that the structure of the similar regions provides a platform for repairing alignment errors, thus leading to significantly improved variant calling accuracy.}
}

EndNote citation:

%0 Thesis
%A Curtis, Kristal
%T Leveraging Similar Regions to Improve Genome Data Processing
%I EECS Department, University of California, Berkeley
%D 2015
%8 September 15
%@ UCB/EECS-2015-199
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-199.html
%F Curtis:EECS-2015-199