Master's Theses & Technical Reports - Jiantao Jiao
5th Year M.S.
Reward Modeling for Human Preferences
Evan Frick [2025]
First Token Probabilities are Unreliable Indicators for LLM Knowledge
Justin Shao [2024]
Theory and Application of Bonus-based Exploration in Reinforcement Learning
Bryan Chen [2021]
M.S.
Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL
Tianhao Wu [2024]
Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL
Tianhao Wu [2024]
Comparative Studies on Sample Complexity Bounds in Multi-Agent Reinforcement Learning
Jiaqi Yang [2022]