### Yitao Duan

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2007-165

December 19, 2007

### http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-165.pdf

Privacy is becoming an increasingly important issue in electronic commerce and other online activities that are growing in popularity. This work introduces a framework, called Peers for Privacy (P4P), for implementing many useful algorithms with provable privacy and adequate efficiency in a realistic adversary model at a reasonably large scale. The basic idea is to decompose an algorithm into a series of addition-only steps, which have very efficient private implementation using cryptographic tools. This simple model is surprisingly general and supports many algorithms prevalent in distributed data mining. Examples include linear algorithms like voting and summation, as well as nonlinear algorithms such as regression, classification, SVD, PCA, $k$-means, ID3, machine learning algorithms based on Expectation Maximization (EM), etc. In fact all algorithms in the statistical query model are supported.

The computation of the sums is based on a highly efficient verifiable secret sharing (VSS) scheme that allows secret-shared arithmetic operations to be done over \emph{small} fields (e.g. 32 or 64 bits) where private arithmetic operations have the same cost as normal arithmetic. This thesis shows that this paradigm admits efficient zero-knowledge tools that can be used to verify the properties of user data such as equality and boundedness. These tools provide practical mechanisms to deal with cheating users. One such tool is an extremely efficient zero-knowledge proof that verifies the L2-norm of the user data is bounded by a constant. This is to prevent a malicious user from exerting too much influence on the computation. The verification uses a linear number of inexpensive small field operations, and only a logarithmic number of large-field (1024 bits or more) cryptographic operations, and can achieve orders of magnitude reduction in running time over standard techniques (from hours to seconds) for large-scale problems. Concrete examples are given to demonstrate how the framework supports private computation of popular algorithms such as SVD, link analysis and association rule mining. The thesis also includes schemes for scalable multicast encryption and bidirectional group communication. They provide secure data transmission support for the type of communication pattern required by the P4P framework and many other group-oriented applications.

**Advisor:** John F. Canny

BibTeX citation:

@phdthesis{Duan:EECS-2007-165, Author = {Duan, Yitao}, Title = {P4P: A Practical Framework for Privacy-Preserving Distributed Computation}, School = {EECS Department, University of California, Berkeley}, Year = {2007}, Month = {Dec}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-165.html}, Number = {UCB/EECS-2007-165}, Abstract = {Privacy is becoming an increasingly important issue in electronic commerce and other online activities that are growing in popularity. This work introduces a framework, called Peers for Privacy (P4P), for implementing many useful algorithms with provable privacy and adequate efficiency in a realistic adversary model at a reasonably large scale. The basic idea is to decompose an algorithm into a series of addition-only steps, which have very efficient private implementation using cryptographic tools. This simple model is surprisingly general and supports many algorithms prevalent in distributed data mining. Examples include linear algorithms like voting and summation, as well as nonlinear algorithms such as regression, classification, SVD, PCA, $k$-means, ID3, machine learning algorithms based on Expectation Maximization (EM), etc. In fact all algorithms in the statistical query model are supported. The computation of the sums is based on a highly efficient verifiable secret sharing (VSS) scheme that allows secret-shared arithmetic operations to be done over \emph{small} fields (e.g. 32 or 64 bits) where private arithmetic operations have the same cost as normal arithmetic. This thesis shows that this paradigm admits efficient zero-knowledge tools that can be used to verify the properties of user data such as equality and boundedness. These tools provide practical mechanisms to deal with cheating users. One such tool is an extremely efficient zero-knowledge proof that verifies the L2-norm of the user data is bounded by a constant. This is to prevent a malicious user from exerting too much influence on the computation. The verification uses a linear number of inexpensive small field operations, and only a logarithmic number of large-field (1024 bits or more) cryptographic operations, and can achieve orders of magnitude reduction in running time over standard techniques (from hours to seconds) for large-scale problems. Concrete examples are given to demonstrate how the framework supports private computation of popular algorithms such as SVD, link analysis and association rule mining. The thesis also includes schemes for scalable multicast encryption and bidirectional group communication. They provide secure data transmission support for the type of communication pattern required by the P4P framework and many other group-oriented applications.} }

EndNote citation:

%0 Thesis %A Duan, Yitao %T P4P: A Practical Framework for Privacy-Preserving Distributed Computation %I EECS Department, University of California, Berkeley %D 2007 %8 December 19 %@ UCB/EECS-2007-165 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-165.html %F Duan:EECS-2007-165