Arun Ganesh

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-53

May 10, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-53.pdf

Differential privacy is the now de facto industry standard for ensuring privacy while publicly releasing statistics about a sensitive database. At a high level, differentially private algorithms add noise to the statistics they compute, so an adversary cannot with high confidence guess if any individual is in the database as any individual's effect on the statistics can be replicated by the noise.

The fundamental paradigm in differential privacy is the privacy-accuracy trade-off: A differentially private algorithm's output can be made more accurate by reducing the amount of noise added, but in doing so the privacy guarantee decays. Current state-of-the-art algorithms often require practitioners to choose between a large drop in accuracy compared to non-private algorithms, or very weak privacy guarantees. Improving the trade-off between privacy, accuracy, would ideally allow practitioners to get the ``best of both worlds.'' Sometimes, efficiency is also traded off with privacy and accuracy. That is, despite differential privacy being an information-theoretic guarantee, an inefficient (and thus impractical to implement) algorithm may still obtain a better privacy-accuracy trade-off than the best known efficient algorithm. Designing efficient algorithms that match the privacy-accuracy trade-off of known inefficient algorithms thus is also of practical importance.

In this thesis, we consider counting queries, private log-strongly concave sampling, and private convex optimization, all fundamental problems in sampling and optimization, and give algorithms for each problem improving the privacy-accuracy trade-off or efficiency when compared to the previous state of the art algorithms. For counting queries, we show that adding noise from a ``Generalized Gaussian'' gives better worst-case accuracy compared to Gaussian noise. For private log-strongly concave sampling, we show that the discrete Langevin dynamics allows one to efficiently approximately sample from a target distribution while preserving privacy, a commonly needed primitive in private optimization. For private convex optimization, we show that in some settings (including e.g. private linear regression), if we are given a sufficient amount of public data, we can obtain a better dependence on the dimensionality of the problem than differentially private gradient descent.

Advisors: Satish Rao


BibTeX citation:

@phdthesis{Ganesh:EECS-2022-53,
    Author= {Ganesh, Arun},
    Title= {Improved Algorithms and Upper Bounds in Differential Privacy},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-53.html},
    Number= {UCB/EECS-2022-53},
    Abstract= {Differential privacy is the now de facto industry standard for ensuring privacy while publicly releasing statistics about a sensitive database. At a high level, differentially private algorithms add noise to the statistics they compute, so an adversary cannot with high confidence guess if any individual is in the database as any individual's effect on the statistics can be replicated by the noise. 

The fundamental paradigm in differential privacy is the privacy-accuracy trade-off: A differentially private algorithm's output can be made more accurate by reducing the amount of noise added, but in doing so the privacy guarantee decays. Current state-of-the-art algorithms often require practitioners to choose between a large drop in accuracy compared to non-private algorithms, or very weak privacy guarantees. Improving the trade-off between privacy, accuracy, would ideally allow practitioners to get the ``best of both worlds.'' Sometimes, efficiency is also traded off with privacy and accuracy. That is, despite differential privacy being an information-theoretic guarantee, an inefficient (and thus impractical to implement) algorithm may still obtain a better privacy-accuracy trade-off than the best known efficient algorithm. Designing efficient algorithms that match the privacy-accuracy trade-off of known inefficient algorithms thus is also of practical importance.

In this thesis, we consider counting queries, private log-strongly concave sampling, and private convex optimization, all fundamental problems in sampling and optimization, and give algorithms for each problem improving the privacy-accuracy trade-off or efficiency when compared to the previous state of the art algorithms. For counting queries, we show that adding noise from a ``Generalized Gaussian'' gives better worst-case accuracy compared to Gaussian noise. For private log-strongly concave sampling, we show that the discrete Langevin dynamics allows one to efficiently approximately sample from a target distribution while preserving privacy, a commonly needed primitive in private optimization. For private convex optimization, we show that in some settings (including e.g. private linear regression), if we are given a sufficient amount of public data, we can obtain a better dependence on the dimensionality of the problem than differentially private gradient descent.},
}

EndNote citation:

%0 Thesis
%A Ganesh, Arun 
%T Improved Algorithms and Upper Bounds in Differential Privacy
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 10
%@ UCB/EECS-2022-53
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-53.html
%F Ganesh:EECS-2022-53