Applying the Four Russians Technique to Banded Extension and X-Drop Sequence Alignment
Cristina Teodoropol
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-240
December 22, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-240.pdf
The Four Russians technique is a method that divides a sequence alignment problem into tiles and uses a precomputed lookup table to quickly find the solution to each tile. By dividing the dynamic programming table into tiles of size tn×tm, the running time of the alignment algorithm is reduced to O(nm/(tntm)). We explore this technique as it applies to the problem of genomic sequence alignment. Beyond the simple global alignment problem, we implement the Four Russians technique for a banded extension algorithm, which only covers a diagonal band of the dynamic programming table and, similar to local alignment, extends an alignment ending anywhere in the table, and an early dropout algorithm similar to X-Drop. Using a t3×t3 sized tile, the average speedup was 5.15 for the Needleman-Wunsch full global alignment algorithm, 2.98 for the banded extension algorithm, and 2.77 for the X-Drop algorithm, running over data produced by the PBSIM synthetic generator.
Advisors: Katherine A. Yelick and Jonathan Ragan-Kelley
BibTeX citation:
@mastersthesis{Teodoropol:EECS-2020-240,
Author= {Teodoropol, Cristina},
Title= {Applying the Four Russians Technique to Banded Extension and X-Drop Sequence Alignment},
School= {EECS Department, University of California, Berkeley},
Year= {2020},
Month= {Dec},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-240.html},
Number= {UCB/EECS-2020-240},
Abstract= {The Four Russians technique is a method that divides a sequence alignment problem into tiles and uses a precomputed lookup table to quickly find the solution to each tile. By dividing the dynamic programming table into tiles of size <i>t<sub>n</sub>×t<sub>m</sub></i>, the running time of the alignment algorithm is reduced to <i>O(nm/(t<sub>n</sub>t<sub>m</sub>))</i>. We explore this technique as it applies to the problem of genomic sequence alignment. Beyond the simple global alignment problem, we implement the Four Russians technique for a banded extension algorithm, which only covers a diagonal band of the dynamic programming table and, similar to local alignment, extends an alignment ending anywhere in the table, and an early dropout algorithm similar to X-Drop. Using a <i>t<sub>3</sub>×t<sub>3</sub></i> sized tile, the average speedup was 5.15 for the Needleman-Wunsch full global alignment algorithm, 2.98 for the banded extension algorithm, and 2.77 for the X-Drop algorithm, running over data produced by the PBSIM synthetic generator.},
}
EndNote citation:
%0 Thesis %A Teodoropol, Cristina %T Applying the Four Russians Technique to Banded Extension and X-Drop Sequence Alignment %I EECS Department, University of California, Berkeley %D 2020 %8 December 22 %@ UCB/EECS-2020-240 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-240.html %F Teodoropol:EECS-2020-240