UPSC2M: Benchmarking Adaptive Learning from Two Million MCQ Attempts

Kevin Shi

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2025-74

May 15, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-74.pdf

We present UPSC2M, a large-scale dataset comprising two million multiple-choice question attempts from over 46,000 students, spanning nearly 9,000 questions across seven subject areas. The questions are drawn from the Union Public Service Commission (UPSC) examination, one of India’s most competitive and high-stakes assessments. Each attempt includes both response correctness and time taken, enabling fine-grained analysis of learner behavior and question characteristics. Over this dataset, we define two core benchmark tasks: question difficulty estimation and student performance prediction. The first task involves predicting empirical correctness rates using only question text. For this, we benchmark several baselines and introduce LLM-Guided Feature Regression (LFR), a content-based regression pipeline that leverages question features extracted by large language models. The second task focuses on predicting the likelihood of a correct response based on prior interactions. Here, we evaluate standard approaches and propose Subject Knowledge Tracking (SKT), a lightweight knowledge-tracking algorithm for subject-level proficiency modeling. Together, the dataset and benchmarks offer a strong foundation for building scalable, personalized educational systems. We release the dataset and code to support further research at the intersection of content understanding, learner modeling, and adaptive assessment: https://github.com/kevins-hi/upsc2m.

Advisors: Jitendra Malik

BibTeX citation:

@mastersthesis{Shi:EECS-2025-74,
    Author= {Shi, Kevin},
    Title= {UPSC2M: Benchmarking Adaptive Learning from Two Million MCQ Attempts},
    School= {EECS Department, University of California, Berkeley},
    Year= {2025},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-74.html},
    Number= {UCB/EECS-2025-74},
    Abstract= {We present UPSC2M, a large-scale dataset comprising two million multiple-choice question attempts from over 46,000 students, spanning nearly 9,000 questions across seven subject areas. The questions are drawn from the Union Public Service Commission (UPSC) examination, one of India’s most competitive and high-stakes assessments. Each attempt includes both response correctness and time taken, enabling fine-grained analysis of learner behavior and question characteristics. Over this dataset, we define two core benchmark tasks: question difficulty estimation and student performance prediction. The first task involves predicting empirical correctness rates using only question text. For this, we benchmark several baselines and introduce LLM-Guided Feature Regression (LFR), a content-based regression pipeline that leverages question features extracted by large language models. The second task focuses on predicting the likelihood of a correct response based on prior interactions. Here, we evaluate standard approaches and propose Subject Knowledge Tracking (SKT), a lightweight knowledge-tracking algorithm for subject-level proficiency modeling. Together, the dataset and benchmarks offer a strong foundation for building scalable, personalized educational systems. We release the dataset and code to support further research at the intersection of content understanding, learner modeling, and adaptive assessment: https://github.com/kevins-hi/upsc2m.},
}

EndNote citation:

%0 Thesis
%A Shi, Kevin 
%T UPSC2M: Benchmarking Adaptive Learning from Two Million MCQ Attempts
%I EECS Department, University of California, Berkeley
%D 2025
%8 May 15
%@ UCB/EECS-2025-74
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-74.html
%F Shi:EECS-2025-74