HOLMES: Efficient Distribution Testing for Secure Collaborative Learning

Ian Chang and Katerina Sotiraki and Weikeng Chen and Murat Kantarcioglu and Raluca Ada Popa

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-171

May 12, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-171.pdf

Using secure multiparty computation (MPC), organizations which own sensitive data (e.g., in healthcare, finance or law enforcement) can train machine learning models over their joint dataset without revealing their data to each other. At the same time, secure computation restricts operations on the joint dataset, which impedes computation to assess its quality. Without such an assessment, deploying a jointly trained model is potentially illegal. Regulations, such as the European Union’s General Data Protection Regulation (GDPR), require organizations to be legally responsible for the errors, bias, or discrimination caused by their machine learning models. Hence, testing data quality emerges as an indispensable step in secure collaborative learning. However, performing distribution testing is prohibitively expensive using current techniques, as shown in our experiments.

We present HOLMES, a protocol for performing distribution testing efficiently. In our experiments, compared with three non-trivial baselines, HOLMES achieves a speedup of more than 10 times for classical distribution tests and up to 10^4 times for multidimensional tests. The core of HOLMES is a hybrid protocol that integrates MPC with zero-knowledge proofs and a new ZK-friendly and naturally oblivious sketching algorithm for multidimensional tests, both with significantly lower computational complexity and concrete execution costs.

BibTeX citation:

@techreport{Chang:EECS-2023-171,
    Author= {Chang, Ian and Sotiraki, Katerina and Chen, Weikeng and Kantarcioglu, Murat and Popa, Raluca Ada},
    Title= {HOLMES: Efficient Distribution Testing for Secure Collaborative Learning},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-171.html},
    Number= {UCB/EECS-2023-171},
    Abstract= {Using secure multiparty computation (MPC), organizations which own sensitive data (e.g., in healthcare, finance or law enforcement) can train machine learning models over their joint dataset without revealing their data to each other. At the same time, secure computation restricts operations on the joint dataset, which impedes computation to assess its quality. Without such an assessment, deploying a jointly trained model is potentially illegal. Regulations, such as the European Union’s General Data Protection Regulation (GDPR), require organizations to be legally responsible for the errors, bias, or discrimination caused by their machine learning models. Hence, testing data quality emerges as an indispensable step in secure collaborative learning. However, performing distribution testing is prohibitively expensive using current techniques, as shown in our experiments.

We present HOLMES, a protocol for performing distribution testing efficiently. In our experiments, compared with three non-trivial baselines, HOLMES achieves a speedup of more than 10 times for classical distribution tests and up to 10^4 times for multidimensional tests. The core of HOLMES is a hybrid protocol that integrates MPC with zero-knowledge proofs and a new ZK-friendly and naturally oblivious sketching algorithm for multidimensional tests, both with significantly lower computational complexity and concrete execution costs.},
}

EndNote citation:

%0 Report
%A Chang, Ian 
%A Sotiraki, Katerina 
%A Chen, Weikeng 
%A Kantarcioglu, Murat 
%A Popa, Raluca Ada 
%T HOLMES: Efficient Distribution Testing for Secure Collaborative Learning
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 12
%@ UCB/EECS-2023-171
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-171.html
%F Chang:EECS-2023-171