Applying the Four Russians Technique to Banded Extension and X-Drop Sequence Alignment
Cristina Teodoropol
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-240
December 22, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-240.pdf
The Four Russians technique is a method that divides a sequence alignment problem into tiles and uses a precomputed lookup table to quickly find the solution to each tile. By dividing the dynamic programming table into tiles of size <i>t<sub>n</sub>×t<sub>m</sub></i>, the running time of the alignment algorithm is reduced to <i>O(nm/(t<sub>n</sub>t<sub>m</sub>))</i>. We explore this technique as it applies to the problem of genomic sequence alignment. Beyond the simple global alignment problem, we implement the Four Russians technique for a banded extension algorithm, which only covers a diagonal band of the dynamic programming table and, similar to local alignment, extends an alignment ending anywhere in the table, and an early dropout algorithm similar to X-Drop. Using a <i>t<sub>3</sub>×t<sub>3</sub></i> sized tile, the average speedup was 5.15 for the Needleman-Wunsch full global alignment algorithm, 2.98 for the banded extension algorithm, and 2.77 for the X-Drop algorithm, running over data produced by the PBSIM synthetic generator.
Advisors: Katherine A. Yelick and Jonathan Ragan-Kelley
BibTeX citation:
@mastersthesis{Teodoropol:EECS-2020-240, Author= {Teodoropol, Cristina}, Title= {Applying the Four Russians Technique to Banded Extension and X-Drop Sequence Alignment}, School= {EECS Department, University of California, Berkeley}, Year= {2020}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-240.html}, Number= {UCB/EECS-2020-240}, Abstract= {The Four Russians technique is a method that divides a sequence alignment problem into tiles and uses a precomputed lookup table to quickly find the solution to each tile. By dividing the dynamic programming table into tiles of size <i>t<sub>n</sub>×t<sub>m</sub></i>, the running time of the alignment algorithm is reduced to <i>O(nm/(t<sub>n</sub>t<sub>m</sub>))</i>. We explore this technique as it applies to the problem of genomic sequence alignment. Beyond the simple global alignment problem, we implement the Four Russians technique for a banded extension algorithm, which only covers a diagonal band of the dynamic programming table and, similar to local alignment, extends an alignment ending anywhere in the table, and an early dropout algorithm similar to X-Drop. Using a <i>t<sub>3</sub>×t<sub>3</sub></i> sized tile, the average speedup was 5.15 for the Needleman-Wunsch full global alignment algorithm, 2.98 for the banded extension algorithm, and 2.77 for the X-Drop algorithm, running over data produced by the PBSIM synthetic generator.}, }
EndNote citation:
%0 Thesis %A Teodoropol, Cristina %T Applying the Four Russians Technique to Banded Extension and X-Drop Sequence Alignment %I EECS Department, University of California, Berkeley %D 2020 %8 December 22 %@ UCB/EECS-2020-240 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-240.html %F Teodoropol:EECS-2020-240