Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng and Stuart J. Russell and Jacob Steinhardt
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2025-141
July 14, 2025
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-141.pdf
Advisors: Stuart J. Russell and Jacob Steinhardt
BibTeX citation:
@mastersthesis{Feng:EECS-2025-141,
Author= {Feng, Jiahai and Russell, Stuart J. and Steinhardt, Jacob},
Title= {Monitoring Latent World States in Language Models with Propositional Probes},
School= {EECS Department, University of California, Berkeley},
Year= {2025},
Month= {Jul},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-141.html},
Number= {UCB/EECS-2025-141},
}
EndNote citation:
%0 Thesis %A Feng, Jiahai %A Russell, Stuart J. %A Steinhardt, Jacob %T Monitoring Latent World States in Language Models with Propositional Probes %I EECS Department, University of California, Berkeley %D 2025 %8 July 14 %@ UCB/EECS-2025-141 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-141.html %F Feng:EECS-2025-141