Experimental analysis of new algorithms for learning ternary classifiers

Discrete linear classifier is a very sparse class of decision model that has proved useful to reduce overfitting in very high dimension learning problems. However, learning discrete linear classifier is known as a difficult problem. It requires finding a discrete linear model minimizing the classification error over a given sample. A ternary classifier is a classifier defined by a pair (w, r) where w is a vector in 1, 0, +1n and r is a nonnegative real capturing the threshold or offset. The goal of the learning algorithm is to find a vector of weights in 1, 0, +1n that minimizes the hinge loss of the linear model from the training data. This problem is NP-hard and one approach consists in exactly solving the relaxed continuous problem and to heuristically derive discrete solutions. A recent paper by the authors has introduced a randomized rounding algorithm [1] and we propose in this paper more sophisticated algorithms that improve the generalization error. These algorithms are presented and their performances are experimentally analyzed. Our results show that this kind of compact model can address the complex problem of learning predictors from bioinformatics data such as metagenomics ones where the size of samples is much smaller than the number of attributes. The new algorithms presented improve the state of the art algorithm to learn ternary classifier. The source of power of this improvement is done at the expense of time complexity.

Title: Experimental analysis of new algorithms for learning ternary classifiers
Authors: Zucker, Jean-Daniel
Chevaleyre, Yann
Dao, Van Sang
Keywords: Ternary Classifier
Randomized Rounding
Metagenomics data
Issue Date: 2015
Publisher: Institute of Electrical and Electronics Engineers Inc.
Citation: Scopus
Abstract: Discrete linear classifier is a very sparse class of decision model that has proved useful to reduce overfitting in very high dimension learning problems. However, learning discrete linear classifier is known as a difficult problem. It requires finding a discrete linear model minimizing the classification error over a given sample. A ternary classifier is a classifier defined by a pair (w, r) where w is a vector in 1, 0, +1n and r is a nonnegative real capturing the threshold or offset. The goal of the learning algorithm is to find a vector of weights in 1, 0, +1n that minimizes the hinge loss of the linear model from the training data. This problem is NP-hard and one approach consists in exactly solving the relaxed continuous problem and to heuristically derive discrete solutions. A recent paper by the authors has introduced a randomized rounding algorithm [1] and we propose in this paper more sophisticated algorithms that improve the generalization error. These algorithms are presented and their performances are experimentally analyzed. Our results show that this kind of compact model can address the complex problem of learning predictors from bioinformatics data such as metagenomics ones where the size of samples is much smaller than the number of attributes. The new algorithms presented improve the state of the art algorithm to learn ternary classifier. The source of power of this improvement is done at the expense of time complexity.
Description: 25 February 2015, Article number 7049868, Pages 19-24, 2015 International Conference on Computing and Communication Technologies: Research, Innovation, and Vision for Future, IEEE RIVF 2015; Can Tho University (CTU)Can Tho; Viet Nam; 25 January 2015 through 28 January 2015
URI: http://ieeexplore.ieee.org/document/7049868/
http://repository.vnu.edu.vn/handle/VNU_123/33101
ISBN: 978-147998043-7
Appears in Collections:Bài báo của ĐHQGHN trong Scopus

Nhận xét

Bài đăng phổ biến từ blog này

Nghiên cứu tác động của biến đổi khí hậu đến sản xuất nông nghiệp trên địa bàn huyện Mai Sơn, tỉnh Sơn La: Luận văn ThS. Biến đổi khí hậu

Quyền làm chủ của nhân dân trong hệ thống sắc lệnh nước Việt Nam Dân chủ Cộng hòa giai đoạn 1945-1946

Phân tích, đánh giá cảnh quan phục vụ phát triển nông, lâm nghiệp và bảo tồn đa dạng sinh học các huyện biên giới Tây Nam tỉnh Nghệ An : Luận án TS. Kiểm soát và bảo vệ môi trường: 628501