Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng and Stuart J. Russell and Jacob Steinhardt
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2025-141
July 14, 2025
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-141.pdf
Advisors: Stuart J. Russell and Jacob Steinhardt
BibTeX citation:
@mastersthesis{Feng:EECS-2025-141, Author= {Feng, Jiahai and Russell, Stuart J. and Steinhardt, Jacob}, Title= {Monitoring Latent World States in Language Models with Propositional Probes}, School= {EECS Department, University of California, Berkeley}, Year= {2025}, Month= {Jul}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-141.html}, Number= {UCB/EECS-2025-141}, }
EndNote citation:
%0 Thesis %A Feng, Jiahai %A Russell, Stuart J. %A Steinhardt, Jacob %T Monitoring Latent World States in Language Models with Propositional Probes %I EECS Department, University of California, Berkeley %D 2025 %8 July 14 %@ UCB/EECS-2025-141 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-141.html %F Feng:EECS-2025-141