Improving Inference Privacy for Large Language Models using Fully Homomorphic Encryption

Rohit Mittal

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-225

December 19, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-225.pdf

As large language models (LLMs) become more and more prevalent in our lives, concerns surrounding the privacy of their input and output data have been brought to the forefront of the debate around the use of this technology. In order for large language models to realize their full potential, solutions must be designed that protect the sensitive data in user queries. This report explores a solution to private inference through fully homomorphic encryption. The report showcases how fully homomorphic encryption (FHE) can be used to design a model that offloads intensive query computation to a remote server, similar to present-day LLM client-server models, while encrypting the query such that the server can compute over it while being oblivious to the plaintext query itself, ensuring privacy by design without having to trust the server. The report also discusses methods to increase the speed of the computation without directly revealing the inference result and examines the impacts of these methods on a working implementation of the Meta Llama 2 LLM where intensive query computations are offloaded to a server while still ensuring the privacy of the inputs and outputs.

Advisors: Dawn Song

BibTeX citation:

@mastersthesis{Mittal:EECS-2024-225,
    Author= {Mittal, Rohit},
    Title= {Improving Inference Privacy for Large Language Models using Fully Homomorphic Encryption},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-225.html},
    Number= {UCB/EECS-2024-225},
    Abstract= {As large language models (LLMs) become more and more prevalent in our lives, concerns surrounding the privacy of their input and output data have been brought to the forefront of the debate around the use of this technology. In order for large language models to realize their full potential, solutions must be designed that protect the sensitive data in user queries. This report explores a solution to private inference through fully homomorphic encryption. The report showcases how fully homomorphic encryption (FHE) can be used to design a model that offloads intensive query computation to a remote server, similar to present-day LLM client-server models, while encrypting the query such that the server can compute over it while being oblivious to the plaintext query itself, ensuring privacy by design without having to trust the server. The report also discusses methods to increase the speed of the computation without directly revealing the inference result and examines the impacts of these methods on a working implementation of the Meta Llama 2 LLM where intensive query computations are offloaded to a server while still ensuring the privacy of the inputs and outputs.},
}

EndNote citation:

%0 Thesis
%A Mittal, Rohit 
%T Improving Inference Privacy for Large Language Models using Fully Homomorphic Encryption
%I EECS Department, University of California, Berkeley
%D 2024
%8 December 19
%@ UCB/EECS-2024-225
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-225.html
%F Mittal:EECS-2024-225