Incipient Anomaly Detection with Ensemble Learning
Baihong Jin
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-199
December 9, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-199.pdf
Anomaly detection techniques are important in system health monitoring applications (e.g., fault detection and disease diagnosis). By recognizing suspicious patterns in data, anomaly detection models can tell whether system has degraded from the normal operating condition into a faulty or diseased state. To avoid unnecessary losses, it is desirable to have a way to identify incipient anomalies, i.e. to detect potential problems in their early stages of development. In buildings, early detection of incipient faults can help reduce maintenance and repair costs, save energy, and enhance occupant comfort. In healthcare, if incipient diseases can be discovered early, effective treatments can be applied and can prevent diseases from progressing into more severe stages.
However, it is difficult to accurately identify incipient anomalies while at the same time not incurring into too many false alarms. Incipient anomalies present milder deviations compared to severe ones, and are difficult to detect and diagnose due to their close resemblance to normal operating conditions. Anomaly detection approaches based on supervised Machine Learning (ML) rely on high-quality labeled data to build accurate classifiers. However, the lack of incipient anomaly examples in the training data can pose severe risks to anomaly detection methods that are built upon ML techniques, because these anomalies can be easily mistaken as normal operating conditions.
Ensemble learning is widely applied in ML to improve model performance and to mitigate decision risks. In ensemble approaches, predictions from a diverse set of learners are combined to obtain a joint decision with lower bias and variance. Recently, various methods have been explored in literature for estimating prediction uncertainties using ensemble learning. To address this challenge of incipient anomalies, I propose in this dissertation to utilize the uncertainty information available from ensemble learning to identify potential misclassified incipient anomalies. We will show that ensemble learning methods can give improved performance on incipient anomalies and identify common pitfalls in these models through extensive experiments on two real-world applications—detection of chiller faults and diagnosing diabetic retinopathy diseases. A theoretical analysis that compares the two popular strategies for extracting uncertainty information will also be given. We will also discuss how to design more effective ensemble models for detecting incipient anomalies.
Advisors: Alberto L. Sangiovanni-Vincentelli
BibTeX citation:
@phdthesis{Jin:EECS-2020-199, Author= {Jin, Baihong}, Title= {Incipient Anomaly Detection with Ensemble Learning}, School= {EECS Department, University of California, Berkeley}, Year= {2020}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-199.html}, Number= {UCB/EECS-2020-199}, Abstract= {Anomaly detection techniques are important in system health monitoring applications (e.g., fault detection and disease diagnosis). By recognizing suspicious patterns in data, anomaly detection models can tell whether system has degraded from the normal operating condition into a faulty or diseased state. To avoid unnecessary losses, it is desirable to have a way to identify incipient anomalies, i.e. to detect potential problems in their early stages of development. In buildings, early detection of incipient faults can help reduce maintenance and repair costs, save energy, and enhance occupant comfort. In healthcare, if incipient diseases can be discovered early, effective treatments can be applied and can prevent diseases from progressing into more severe stages. However, it is difficult to accurately identify incipient anomalies while at the same time not incurring into too many false alarms. Incipient anomalies present milder deviations compared to severe ones, and are difficult to detect and diagnose due to their close resemblance to normal operating conditions. Anomaly detection approaches based on supervised Machine Learning (ML) rely on high-quality labeled data to build accurate classifiers. However, the lack of incipient anomaly examples in the training data can pose severe risks to anomaly detection methods that are built upon ML techniques, because these anomalies can be easily mistaken as normal operating conditions. Ensemble learning is widely applied in ML to improve model performance and to mitigate decision risks. In ensemble approaches, predictions from a diverse set of learners are combined to obtain a joint decision with lower bias and variance. Recently, various methods have been explored in literature for estimating prediction uncertainties using ensemble learning. To address this challenge of incipient anomalies, I propose in this dissertation to utilize the uncertainty information available from ensemble learning to identify potential misclassified incipient anomalies. We will show that ensemble learning methods can give improved performance on incipient anomalies and identify common pitfalls in these models through extensive experiments on two real-world applications—detection of chiller faults and diagnosing diabetic retinopathy diseases. A theoretical analysis that compares the two popular strategies for extracting uncertainty information will also be given. We will also discuss how to design more effective ensemble models for detecting incipient anomalies.}, }
EndNote citation:
%0 Thesis %A Jin, Baihong %T Incipient Anomaly Detection with Ensemble Learning %I EECS Department, University of California, Berkeley %D 2020 %8 December 9 %@ UCB/EECS-2020-199 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-199.html %F Jin:EECS-2020-199