M.S. | 5th Year M.S.

M.S.

Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL
Tianhao Wu [2024]

Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL
Tianhao Wu [2024]

Comparative Studies on Sample Complexity Bounds in Multi-Agent Reinforcement Learning
Jiaqi Yang [2022]

5th Year M.S.

First Token Probabilities are Unreliable Indicators for LLM Knowledge
Justin Shao [2024]

Theory and Application of Bonus-based Exploration in Reinforcement Learning
Bryan Chen [2021]