HONG KONG BAPTIST UNIVERSITY
FACULTY OF SCIENCE

Department of Computer Science Seminar
2018 Series

Class Imbalance Learning - Problem, Modelling and Challenges

Prof. Yiu Ming Cheung
Professor
Department of Computer Science
Hong Kong Baptist University

Date: July 26, 2018 (Thursday)
Time: 10:30 - 11:30 am
Venue: FSC901C&D, Fong Shu Chuen Library, Ho Sin Hang Campus

Abstract
In many practical problems, number of data form difference classes can be quite imbalanced, which could make the performance of the most machine learning methods become deteriorate to a certain degree. As far as we know, the problem of learning from imbalanced data continues to be one of the challenges in the field of data engineering and machine learning, which has attracted growing attentions in recent years. In this talk, we will first formally describe the class imbalance problem and its significance with examples from real world applications, and review the existing solutions. Then, three research problems, i.e. sampling strategy, classifier weights of boosting and imbalanced streaming data with concept drift, are studied. Accordingly, we have proposed a solution for each problem. The first solution, namely Hybrid Sampling with Bagging (HSBagging) method, utilizes a new hybrid scheme of undersampling and oversampling with sampling rate selection. This method features both of undersampling and oversampling, and the specifically selected sampling rate for each data set. The second solution is called G-mean Optimized Boosting (GOBoost), which is a boosting framework where the classifier weights are optimized on geometric mean measurement. GOBoost is an ensemble framework that can be applied to any boosting-based method for class imbalance learning by simply replacing the classifier weights updating module. The last solution is called Dynamic Weighted Majority for Imbalance Learning (DWMIL). It creates a base classifier for each chunk and weighs them according to their performance evaluated on the current chunk. Thus, a classifier trained recently or on the similar concept to the current chunk will receive high weight in the ensemble to help prediction. Finally, some challenging problems in this topic are explored as well.

********* ALL INTERESTED ARE WELCOME ***********
(For enquiry, please contact Computer Science Department at 3411 2385)

http://www.comp.hkbu.edu.hk/v1/?page=seminars&id=474&lang=tc