Frank Li and Richard Shin and Vern Paxson

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2015-177

July 29, 2015

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-177.pdf

The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud out- sourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system.

Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems.


BibTeX citation:

@techreport{Li:EECS-2015-177,
    Author= {Li, Frank and Shin, Richard and Paxson, Vern},
    Title= {Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners},
    Year= {2015},
    Month= {Jul},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-177.html},
    Number= {UCB/EECS-2015-177},
    Abstract= {The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud out- sourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system.

Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems.},
}

EndNote citation:

%0 Report
%A Li, Frank 
%A Shin, Richard 
%A Paxson, Vern 
%T Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners
%I EECS Department, University of California, Berkeley
%D 2015
%8 July 29
%@ UCB/EECS-2015-177
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-177.html
%F Li:EECS-2015-177