DFS-Perf: A Scalable and Unified Benchmarking Framework for Distributed File Systems

Rong Gu, Qianhao Dong, Haoyuan Li, Joseph Gonzalez, Zhao Zhang, Shuai Wang, Yihua Huang, Scott Shenker, Ion Stoica and Patrick P. C. Lee

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2016-133
July 27, 2016

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-133.pdf

A distributed file system (DFS) is a key component of virtually any cluster computing system. The performance of such system depends heavily on the underlying DFS design and deployment. As a result, it is critical to characterize the performance and design trade-offs of DFSes with respect to cluster configurations and real-world workloads. To this end, we present DFS-Perf, a scalable, extensible, and low-overhead benchmarking framework to evaluate the properties and the performance of various DFS implementations. DFS-Perf uses a highly parallel architecture to cover a large variety of workloads at different scales, and provides an extensible interface to incorporate user-defined workloads and integrate with various DFSes. As a proof of concept, our current DFS-Perf implementation includes several built-in benchmarks and workloads, including machine learning and SQL applications. We present performance comparisons of four state-of-the-art DFS designs, namely Alluxio, CephFS, GlusterFS, and HDFS, on a cluster with 40 nodes (960 cores). We demonstrate that DFS-Perf can provide guidance on existing DFS designs and implementations, while adding 5.7% overhead.


BibTeX citation:

@techreport{Gu:EECS-2016-133,
    Author = {Gu, Rong and Dong, Qianhao and Li, Haoyuan and Gonzalez, Joseph and Zhang, Zhao and Wang, Shuai and Huang, Yihua and Shenker, Scott and Stoica, Ion and Lee, Patrick P. C.},
    Title = {DFS-Perf: A Scalable and Unified Benchmarking Framework for Distributed File Systems},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2016},
    Month = {Jul},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-133.html},
    Number = {UCB/EECS-2016-133},
    Abstract = {A distributed file system (DFS) is a key component of virtually any cluster computing system. The performance of such system depends heavily on the underlying DFS design and deployment. As a result, it is critical to characterize the performance and design trade-offs of DFSes with respect to cluster configurations and real-world workloads. To this end, we present DFS-Perf, a scalable, extensible, and low-overhead benchmarking framework to evaluate the properties and the performance of various DFS implementations. DFS-Perf uses a highly parallel architecture to cover a large variety of workloads at different scales, and provides an extensible interface to incorporate user-defined workloads and integrate with various DFSes. As a proof of concept, our current DFS-Perf implementation includes several built-in benchmarks and workloads, including machine learning and SQL applications. We present performance comparisons of four state-of-the-art DFS designs, namely Alluxio, CephFS, GlusterFS, and HDFS, on a cluster with 40 nodes (960 cores). We demonstrate that DFS-Perf can provide guidance on existing DFS designs and implementations, while adding 5.7% overhead.}
}

EndNote citation:

%0 Report
%A Gu, Rong
%A Dong, Qianhao
%A Li, Haoyuan
%A Gonzalez, Joseph
%A Zhang, Zhao
%A Wang, Shuai
%A Huang, Yihua
%A Shenker, Scott
%A Stoica, Ion
%A Lee, Patrick P. C.
%T DFS-Perf: A Scalable and Unified Benchmarking Framework for Distributed File Systems
%I EECS Department, University of California, Berkeley
%D 2016
%8 July 27
%@ UCB/EECS-2016-133
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-133.html
%F Gu:EECS-2016-133