GPU Accelerated T-Distributed Stochastic Neighbor Embedding

David Chan

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-89

May 28, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-89.pdf

Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples. Existing visualization methods which employ dimensionality reduction to two or three dimensions are often inefficient and/or ineffective for these datasets. For example, T-Distributed Neighbor Embedding (T-SNE) is a popular technique for dimensionality reduction, and visualization of high dimensional point structures, however T-SNE is an inherently slow algorithm, requiring pairwise computation between each of the points in high dimension. This thesis explores GP-GPU accelerated algorithms for approximate T-SNE, and demonstrates multiple algorithms achieving state of the art performance on, and novel visualizations of, common machine learning datasets.

Advisors: John F. Canny

BibTeX citation:

@mastersthesis{Chan:EECS-2020-89,
    Author= {Chan, David},
    Title= {GPU Accelerated T-Distributed Stochastic Neighbor Embedding},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-89.html},
    Number= {UCB/EECS-2020-89},
    Abstract= {Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples. Existing visualization methods which employ dimensionality reduction to two or three dimensions are often inefficient and/or ineffective for these datasets. For example, T-Distributed Neighbor Embedding (T-SNE) is a popular technique for dimensionality reduction, and visualization of high dimensional point structures, however T-SNE is an inherently slow algorithm, requiring pairwise computation between each of the points in high dimension. This thesis explores GP-GPU accelerated algorithms for approximate T-SNE, and demonstrates multiple algorithms achieving state of the art performance on, and novel visualizations of, common machine learning datasets.},
}

EndNote citation:

%0 Thesis
%A Chan, David 
%T GPU Accelerated T-Distributed Stochastic Neighbor Embedding
%I EECS Department, University of California, Berkeley
%D 2020
%8 May 28
%@ UCB/EECS-2020-89
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-89.html
%F Chan:EECS-2020-89