Dueling Metrics: Choosing the Appropriate Error Metric for Models of Cognition in the Learning Analytics Field

Phitchaya Phothilimthana, Seung Yeon Lee and Zachary Pardos

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2018-7
April 15, 2018

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-7.pdf

Similar to how a machine learning model converges by following the gradient produced by the choice of loss function, a scholarly field converges towards adoption of various model modification by following a type of gradient produced by the choice of error metrics used to report results in its papers. In this way, a field and its practitioners become a part of a larger human-centric process of design. In this paper we argue for the importance of choosing the right error metric for a popular cognitive model called Bayesian Knowledge Tracing (BKT), used in the context of intelligent tutoring systems. According to our analyses with synthetic data---including correlation analysis, gradient visualization, and parameter estimation---we find that error metrics of Root Mean Squared Error (RMSE) and log-likelihood provide the best correspondence to the true generating model. Area Under the Curve (AUC) and accuracy are significantly behind, while precision and recall have extremely poor performance. Our result validates the standard practices of using RMSE as a metric to evaluate BKT models and using RMSE or log-likelihood for BKT parameter estimation. Our result adds to the mounting wisdom against using AUC and accuracy, which are the other metrics that have been frequently used to evaluate BKT models as depicted in our seven-year literature review of the field. Additionally, we investigate the validity of parameters estimated using the different error metrics on real data from ASSISTments, Cognitive Tutor, and Khan Academy. The real data analysis reinforces our finding that log-likelihood and RMSE appear to be superior to the rest of the metrics and should be the metric of choice when applying this model.


BibTeX citation:

@techreport{Phothilimthana:EECS-2018-7,
    Author = {Phothilimthana, Phitchaya and Lee, Seung Yeon and Pardos, Zachary},
    Title = {Dueling Metrics: Choosing the Appropriate Error Metric for Models of Cognition in the Learning Analytics Field},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2018},
    Month = {Apr},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-7.html},
    Number = {UCB/EECS-2018-7},
    Abstract = {Similar to how a machine learning model converges by following the gradient produced by the choice of loss function, a scholarly field converges towards adoption of various model modification by following a type of gradient produced by the choice of error metrics used to report results in its papers. In this way, a field and its practitioners become a part of a larger human-centric process of design. In this paper we argue for the importance of choosing the right error metric for a popular cognitive model called Bayesian Knowledge Tracing (BKT), used in the context of intelligent tutoring systems. According to our analyses with synthetic data---including correlation analysis, gradient visualization, and parameter estimation---we find that error metrics of Root Mean Squared Error (RMSE) and log-likelihood provide the best correspondence to the true generating model. Area Under the Curve (AUC) and accuracy are significantly behind, while precision and recall have extremely poor performance. Our result validates the standard practices of using RMSE as a metric to evaluate BKT models and using RMSE or log-likelihood for BKT parameter estimation. Our result adds to the mounting wisdom against using AUC and accuracy, which are the other metrics that have been frequently used to evaluate BKT models as depicted in our seven-year literature review of the field. Additionally, we investigate the validity of parameters estimated using the different error metrics on real data from ASSISTments, Cognitive Tutor, and Khan Academy. The real data analysis reinforces our finding that log-likelihood and RMSE appear to be superior to the rest of the metrics and should be the metric of choice when applying this model.}
}

EndNote citation:

%0 Report
%A Phothilimthana, Phitchaya
%A Lee, Seung Yeon
%A Pardos, Zachary
%T Dueling Metrics: Choosing the Appropriate Error Metric for Models of Cognition in the Learning Analytics Field
%I EECS Department, University of California, Berkeley
%D 2018
%8 April 15
%@ UCB/EECS-2018-7
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-7.html
%F Phothilimthana:EECS-2018-7