Sparse-Graph Codes for the Big Data Era: Exploiting Sparsity to Speed Up Computation
Orhan Ocal
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-70
May 27, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-70.pdf
We are witnessing an unprecedented growth in the amount of data being sensed, collected and processed to drive modern applications across many domains. This growth has challenged classical algorithms to scale up to big-data regimes. Because time is critically important in settings such as control and decision-making, interventional medical imaging, and security, speeding up widely-used routines has the potential to open up the application space.
In this thesis, we focus on six problems of fundamental importance, and develop fast algorithms for solving them whenever there is an underlying structure of sparsity, which typically holds in most natural settings. Our algorithms attain their speed by being sample and computationally efficient whenever there is sparsity. In particular, the problems we study and a summary of our results are as follows. (1) We propose a novel architecture for sampling continuous-time signals that have band-limited and sparse spectra at their minimum sampling rate, which can be much lower than the Nyquist rate. (2) We improve the noise robustness of divide-and-conquer-based sparse Discrete Fourier Transform algorithms by developing a novel method for fast and sample-efficient frequency identification. (3) We develop a fast algorithm for recovering sparse wavelet coefficients of a signal from its Fourier domain samples. This setting arises in Magnetic Resonance Imaging, and our approach provides a theoretical framework for fast imaging. (4) We propose a sample and computationally efficient algorithm for recovering sparse pseudo-Boolean functions with application to learning a DNA repair outcomes for CRISPR-Cas9 experiments. (5) We propose a computationally efficient method for coded computation to reduce the effect of slow computational nodes in distributed matrix multiplication. (6) We propose a computationally efficient algorithm for large-scale matrix multiplication whenever the output is sparse with application to database search.
Our approach in tackling these diverse problems is based on a unifying divide-and-conquer strategy, where by leveraging ideas from modern coding theory (in particular, Low Density Parity Check Codes) which have previously been studied in the context of communications, we create structures that enable simple and fast peeling-based recovery.
Advisors: Kannan Ramchandran
BibTeX citation:
@phdthesis{Ocal:EECS-2020-70, Author= {Ocal, Orhan}, Title= {Sparse-Graph Codes for the Big Data Era: Exploiting Sparsity to Speed Up Computation}, School= {EECS Department, University of California, Berkeley}, Year= {2020}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-70.html}, Number= {UCB/EECS-2020-70}, Abstract= {We are witnessing an unprecedented growth in the amount of data being sensed, collected and processed to drive modern applications across many domains. This growth has challenged classical algorithms to scale up to big-data regimes. Because time is critically important in settings such as control and decision-making, interventional medical imaging, and security, speeding up widely-used routines has the potential to open up the application space. In this thesis, we focus on six problems of fundamental importance, and develop fast algorithms for solving them whenever there is an underlying structure of sparsity, which typically holds in most natural settings. Our algorithms attain their speed by being sample and computationally efficient whenever there is sparsity. In particular, the problems we study and a summary of our results are as follows. (1) We propose a novel architecture for sampling continuous-time signals that have band-limited and sparse spectra at their minimum sampling rate, which can be much lower than the Nyquist rate. (2) We improve the noise robustness of divide-and-conquer-based sparse Discrete Fourier Transform algorithms by developing a novel method for fast and sample-efficient frequency identification. (3) We develop a fast algorithm for recovering sparse wavelet coefficients of a signal from its Fourier domain samples. This setting arises in Magnetic Resonance Imaging, and our approach provides a theoretical framework for fast imaging. (4) We propose a sample and computationally efficient algorithm for recovering sparse pseudo-Boolean functions with application to learning a DNA repair outcomes for CRISPR-Cas9 experiments. (5) We propose a computationally efficient method for coded computation to reduce the effect of slow computational nodes in distributed matrix multiplication. (6) We propose a computationally efficient algorithm for large-scale matrix multiplication whenever the output is sparse with application to database search. Our approach in tackling these diverse problems is based on a unifying divide-and-conquer strategy, where by leveraging ideas from modern coding theory (in particular, Low Density Parity Check Codes) which have previously been studied in the context of communications, we create structures that enable simple and fast peeling-based recovery.}, }
EndNote citation:
%0 Thesis %A Ocal, Orhan %T Sparse-Graph Codes for the Big Data Era: Exploiting Sparsity to Speed Up Computation %I EECS Department, University of California, Berkeley %D 2020 %8 May 27 %@ UCB/EECS-2020-70 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-70.html %F Ocal:EECS-2020-70