Cristina Teodoropol

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-240

December 22, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-240.pdf

The Four Russians technique is a method that divides a sequence alignment problem into tiles and uses a precomputed lookup table to quickly find the solution to each tile. By dividing the dynamic programming table into tiles of size <i>t<sub>n</sub>×t<sub>m</sub></i>, the running time of the alignment algorithm is reduced to <i>O(nm/(t<sub>n</sub>t<sub>m</sub>))</i>. We explore this technique as it applies to the problem of genomic sequence alignment. Beyond the simple global alignment problem, we implement the Four Russians technique for a banded extension algorithm, which only covers a diagonal band of the dynamic programming table and, similar to local alignment, extends an alignment ending anywhere in the table, and an early dropout algorithm similar to X-Drop. Using a <i>t<sub>3</sub>×t<sub>3</sub></i> sized tile, the average speedup was 5.15 for the Needleman-Wunsch full global alignment algorithm, 2.98 for the banded extension algorithm, and 2.77 for the X-Drop algorithm, running over data produced by the PBSIM synthetic generator.

Advisors: Katherine A. Yelick and Jonathan Ragan-Kelley


BibTeX citation:

@mastersthesis{Teodoropol:EECS-2020-240,
    Author= {Teodoropol, Cristina},
    Title= {Applying the Four Russians Technique to Banded Extension and X-Drop Sequence Alignment},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-240.html},
    Number= {UCB/EECS-2020-240},
    Abstract= {The Four Russians technique is a method that divides a sequence alignment problem into tiles and uses a precomputed lookup table to quickly find the solution to each tile. By dividing the dynamic programming table into tiles of size <i>t<sub>n</sub>×t<sub>m</sub></i>, the running time of the alignment algorithm is reduced to <i>O(nm/(t<sub>n</sub>t<sub>m</sub>))</i>. We explore this technique as it applies to the problem of genomic sequence alignment. Beyond the simple global alignment problem, we implement the Four Russians  technique for a banded extension algorithm, which only covers a diagonal band of the dynamic programming table and, similar to local alignment, extends an alignment ending anywhere in the table, and an early dropout algorithm similar to X-Drop. Using a <i>t<sub>3</sub>×t<sub>3</sub></i> sized tile, the average speedup was 5.15 for the Needleman-Wunsch full global alignment algorithm, 2.98 for the banded extension algorithm, and 2.77 for the X-Drop algorithm, running over data produced by the PBSIM synthetic generator.},
}

EndNote citation:

%0 Thesis
%A Teodoropol, Cristina 
%T Applying the Four Russians Technique to Banded Extension and X-Drop Sequence Alignment
%I EECS Department, University of California, Berkeley
%D 2020
%8 December 22
%@ UCB/EECS-2020-240
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-240.html
%F Teodoropol:EECS-2020-240