Robust Classification with Interval Data

Laurent El Ghaoui, Gert R. G. Lanckriet and Georges Natsoulis

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-03-1279
October 2003

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/CSD-03-1279.pdf

We consider a binary, linear classification problem in which the data points are assumed to be unknown, but bounded within given hyper-rectangles, i.e., the covariates are bounded within intervals explicitly given for each data point separately. We address the problem of designing a robust classifier in this setting by minimizing the worst-case value of a given loss function, over all possible choices of the data in these multi-dimensional intervals. We examine in detail the application of this methodology to three specific loss functions, arising in support vector machines, in logistic regression and in minimax probability machines. We show that in each case, the resulting problem is amenable to efficient interior-point algorithms for convex optimization. The methods tend to produce sparse classifiers, i.e., they induce many zero coefficients in the resulting weight vectors, and we provide some theoretical grounds for this property. After presenting possible extensions of this framework to handle label errors and other uncertainty models, we discuss in some detail our implementation, which exploits the potential sparsity or a more general property referred to as regularity, of the input matrices.


BibTeX citation:

@techreport{El Ghaoui:CSD-03-1279,
    Author = {El Ghaoui, Laurent and Lanckriet, Gert R. G. and Natsoulis, Georges},
    Title = {Robust Classification with Interval Data},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2003},
    Month = {Oct},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/5772.html},
    Number = {UCB/CSD-03-1279},
    Abstract = {We consider a binary, linear classification problem in which the data points are assumed to be unknown, but bounded within given hyper-rectangles, i.e., the covariates are bounded within intervals explicitly given for each data point separately. We address the problem of designing a robust classifier in this setting by minimizing the worst-case value of a given loss function, over all possible choices of the data in these multi-dimensional intervals. We examine in detail the application of this methodology to three specific loss functions, arising in support vector machines, in logistic regression and in minimax probability machines. We show that in each case, the resulting problem is amenable to efficient interior-point algorithms for convex optimization. The methods tend to produce sparse classifiers, i.e., they induce many zero coefficients in the resulting weight vectors, and we provide some theoretical grounds for this property. After presenting possible extensions of this framework to handle label errors and other uncertainty models, we discuss in some detail our implementation, which exploits the potential sparsity or a more general property referred to as regularity, of the input matrices.}
}

EndNote citation:

%0 Report
%A El Ghaoui, Laurent
%A Lanckriet, Gert R. G.
%A Natsoulis, Georges
%T Robust Classification with Interval Data
%I EECS Department, University of California, Berkeley
%D 2003
%@ UCB/CSD-03-1279
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/5772.html
%F El Ghaoui:CSD-03-1279