Post Verification of Integrity of Remote Queries in Opaque

Andrew Law How Hung

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2021-44
May 11, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-44.pdf

Many companies and individuals in the modern age of big data outsource their data and computation to third parties who specialize in maintaining hardware and cloud services. However, they may want to keep their data and computation secret for business or privacy interests. Opaque is a system that offers secure data analytics in an untrusted cloud by leveraging special hardware called secure enclaves as well as a variety of other novel techniques. Opaque is built on Spark SQL, a powerful Spark module that performs data processing and analytics. Spark SQL, and by extension, Opaque, distributes data among nodes to parallelize workloads. While each such node is trusted, the job driver/scheduler, which resides in the cloud and delegates the tasks, is not. This work outlines a design and implementation to preserve query integrity in the face of the untrusted scheduler using logs in the form of HMAC outputs and graph computations.

Advisor: Raluca Ada Popa


BibTeX citation:

@mastersthesis{Law How Hung:EECS-2021-44,
    Author = {Law How Hung, Andrew},
    Title = {Post Verification of Integrity of Remote Queries in Opaque},
    School = {EECS Department, University of California, Berkeley},
    Year = {2021},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-44.html},
    Number = {UCB/EECS-2021-44},
    Abstract = {Many companies and individuals in the modern age of big data outsource their data and computation to third parties who specialize in maintaining hardware and cloud services. However, they may want to keep their data and computation secret for business or privacy interests. Opaque is a system that offers secure data analytics in an untrusted cloud by leveraging special hardware called secure enclaves as well as a variety of other novel techniques. Opaque is built on Spark SQL, a powerful Spark module that performs data processing and analytics. Spark SQL, and by extension, Opaque, distributes data among nodes to parallelize workloads. While each such node is trusted, the job driver/scheduler, which resides in the cloud and delegates the tasks, is not. This work outlines a design and implementation to preserve query integrity in the face of the untrusted scheduler using logs in the form of HMAC outputs and graph computations.}
}

EndNote citation:

%0 Thesis
%A Law How Hung, Andrew
%T Post Verification of Integrity of Remote Queries in Opaque
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 11
%@ UCB/EECS-2021-44
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-44.html
%F Law How Hung:EECS-2021-44