5th Year M.S. | M.S.

5th Year M.S.

Reward Modeling for Human Preferences
Evan Frick [2025]

First Token Probabilities are Unreliable Indicators for LLM Knowledge
Justin Shao [2024]

Theory and Application of Bonus-based Exploration in Reinforcement Learning
Bryan Chen [2021]

M.S.

Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL
Tianhao Wu [2024]

Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL
Tianhao Wu [2024]

Comparative Studies on Sample Complexity Bounds in Multi-Agent Reinforcement Learning
Jiaqi Yang [2022]