Experimental analysis of new algorithms for learning ternary classifiers
Discrete linear classifier is a very sparse class of decision model that
has proved useful to reduce overfitting in very high dimension learning
problems. However, learning discrete linear classifier is known as a
difficult problem. It requires finding a discrete linear model
minimizing the classification error over a given sample. A ternary
classifier is a classifier defined by a pair (w, r) where w is a vector
in 1, 0, +1n and r is a nonnegative real capturing the threshold or
offset. The goal of the learning algorithm is to find a vector of
weights in 1, 0, +1n that minimizes the hinge loss of the linear model
from the training data. This problem is NP-hard and one approach
consists in exactly solving the relaxed continuous problem and to
heuristically derive discrete solutions. A recent paper by the authors
has introduced a randomized rounding algorithm [1] and we propose in
this paper more sophisticated algorithms that improve the generalization
error. These algorithms are presented and their performances are
experimentally analyzed. Our results show that this kind of compact
model can address the complex problem of learning predictors from
bioinformatics data such as metagenomics ones where the size of samples
is much smaller than the number of attributes. The new algorithms
presented improve the state of the art algorithm to learn ternary
classifier. The source of power of this improvement is done at the
expense of time complexity.
Title: | Experimental analysis of new algorithms for learning ternary classifiers |
Authors: | Zucker, Jean-Daniel Chevaleyre, Yann Dao, Van Sang |
Keywords: | Ternary Classifier Randomized Rounding Metagenomics data |
Issue Date: | 2015 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Citation: | Scopus |
Abstract: | Discrete linear classifier is a very sparse class of decision model that has proved useful to reduce overfitting in very high dimension learning problems. However, learning discrete linear classifier is known as a difficult problem. It requires finding a discrete linear model minimizing the classification error over a given sample. A ternary classifier is a classifier defined by a pair (w, r) where w is a vector in 1, 0, +1n and r is a nonnegative real capturing the threshold or offset. The goal of the learning algorithm is to find a vector of weights in 1, 0, +1n that minimizes the hinge loss of the linear model from the training data. This problem is NP-hard and one approach consists in exactly solving the relaxed continuous problem and to heuristically derive discrete solutions. A recent paper by the authors has introduced a randomized rounding algorithm [1] and we propose in this paper more sophisticated algorithms that improve the generalization error. These algorithms are presented and their performances are experimentally analyzed. Our results show that this kind of compact model can address the complex problem of learning predictors from bioinformatics data such as metagenomics ones where the size of samples is much smaller than the number of attributes. The new algorithms presented improve the state of the art algorithm to learn ternary classifier. The source of power of this improvement is done at the expense of time complexity. |
Description: | 25 February 2015, Article number 7049868, Pages 19-24, 2015 International Conference on Computing and Communication Technologies: Research, Innovation, and Vision for Future, IEEE RIVF 2015; Can Tho University (CTU)Can Tho; Viet Nam; 25 January 2015 through 28 January 2015 |
URI: | http://ieeexplore.ieee.org/document/7049868/ http://repository.vnu.edu.vn/handle/VNU_123/33101 |
ISBN: | 978-147998043-7 |
Appears in Collections: | Bài báo của ĐHQGHN trong Scopus |
Nhận xét
Đăng nhận xét