Orr Paradise
EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2025-92
May 16, 2025
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-92.pdf
This dissertation examines proof systems and translation methods, addressing both theoretical foundations and practical considerations in each domain.
The first part, on Proofs, introduces rectangular probabilistically checkable proofs (rectangular PCPs), wherein proofs are thought of as square matrices, and the verifier's randomness can be split into two independent parts, one determining the row of each query and the other determining the column. We construct rectangular PCPs and use them to show that proofs for hard languages are rigid---extending and strengthening recent rigid matrix constructions.
We then propose Self-Proving models: learned models that prove the correctness of their output to a verification algorithm via an Interactive Proof. We devise a generic method for learning Self-Proving models, and prove its convergence under certain assumptions. We empirically examine our methods by training a Self-Proving transformer to compute the GCD of two integers, and prove correctness of its output. We also introduce Pseudointelligence, a complexity-theoretic framework of model evaluation cast as an interactive proof between a model and a learned evaluator.
The second part, on Translation, explores unsupervised machine translation (UMT) without shared linguistic structure. We develop a theoretical framework for analyzing this setting, and prove sample complexity bounds in stylized yet informative settings. The results show that translation quality improves with language complexity, informing feasibility of animal communication translation. Finally, we present WhAM, a transformer-based model for generating synthetic sperm whale codas. WhAM is trained on real acoustic data and generates audio that approaches the statistical and perceptual properties of whale communication as evaluated by domain experts. Its learned representations also perform well on classification tasks, contributing to our understanding of non-human communication systems.
Advisor: Shafi Goldwasser and Avishay Tal
";
?>
BibTeX citation:
@phdthesis{Paradise:EECS-2025-92, Author = {Paradise, Orr}, Title = {On Proofs and Translation}, School = {EECS Department, University of California, Berkeley}, Year = {2025}, Month = {May}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-92.html}, Number = {UCB/EECS-2025-92}, Abstract = {This dissertation examines proof systems and translation methods, addressing both theoretical foundations and practical considerations in each domain. The first part, on Proofs, introduces rectangular probabilistically checkable proofs (rectangular PCPs), wherein proofs are thought of as square matrices, and the verifier's randomness can be split into two independent parts, one determining the row of each query and the other determining the column. We construct rectangular PCPs and use them to show that proofs for hard languages are rigid---extending and strengthening recent rigid matrix constructions. We then propose Self-Proving models: learned models that prove the correctness of their output to a verification algorithm via an Interactive Proof. We devise a generic method for learning Self-Proving models, and prove its convergence under certain assumptions. We empirically examine our methods by training a Self-Proving transformer to compute the GCD of two integers, and prove correctness of its output. We also introduce Pseudointelligence, a complexity-theoretic framework of model evaluation cast as an interactive proof between a model and a learned evaluator. The second part, on Translation, explores unsupervised machine translation (UMT) without shared linguistic structure. We develop a theoretical framework for analyzing this setting, and prove sample complexity bounds in stylized yet informative settings. The results show that translation quality improves with language complexity, informing feasibility of animal communication translation. Finally, we present WhAM, a transformer-based model for generating synthetic sperm whale codas. WhAM is trained on real acoustic data and generates audio that approaches the statistical and perceptual properties of whale communication as evaluated by domain experts. Its learned representations also perform well on classification tasks, contributing to our understanding of non-human communication systems.} }
EndNote citation:
%0 Thesis %A Paradise, Orr %T On Proofs and Translation %I EECS Department, University of California, Berkeley %D 2025 %8 May 16 %@ UCB/EECS-2025-92 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-92.html %F Paradise:EECS-2025-92