Optimizing Random Forests on GPU

Derrick Cheng and John F. Canny

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2014-205
December 1, 2014

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-205.pdf

We have designed BIDMachRF – an implementation of Random Forest with high CPU and GPU throughput and with full scalability. It is based on parallelism, maximal work by each datum, reduction of unnecessary data access, sorting, and data compression. BIDMachRF is optimized for GB sized large datasets, and our goal is to be 10-100x faster than SciKit-Learn Random Forests and CudaTree on these large datasets. BIDMachRF is currently a work in progress. This paper describes the current state of our implementation as well as points for improvement, which we have identified through benchmarks on classical datasets. Our current in progress version has already shown to be 5x faster than implementations such as SciKit-Learn on large sized GBs of data and is estimated to be at least 20x faster than those implementations when complete.

Advisor: John F. Canny


BibTeX citation:

@mastersthesis{Cheng:EECS-2014-205,
    Author = {Cheng, Derrick and Canny, John F.},
    Title = {Optimizing Random Forests on GPU},
    School = {EECS Department, University of California, Berkeley},
    Year = {2014},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-205.html},
    Number = {UCB/EECS-2014-205},
    Abstract = {We have designed BIDMachRF – an implementation of Random Forest with high CPU and GPU throughput and with full scalability. It is based on parallelism, maximal work by each datum, reduction of unnecessary data access, sorting, and data compression. BIDMachRF is optimized for GB sized large datasets, and our goal is to be 10-100x faster than SciKit-Learn Random Forests and CudaTree on these large datasets. BIDMachRF is currently a work in progress. This paper describes the current state of our implementation as well as points for improvement, which we have identified through benchmarks on classical datasets. Our current in progress version has already shown to be 5x faster than implementations such as SciKit-Learn on large sized GBs of data and is estimated to be at least 20x faster than those implementations when complete.}
}

EndNote citation:

%0 Thesis
%A Cheng, Derrick
%A Canny, John F.
%T Optimizing Random Forests on GPU
%I EECS Department, University of California, Berkeley
%D 2014
%8 December 1
%@ UCB/EECS-2014-205
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-205.html
%F Cheng:EECS-2014-205