HONG KONG BAPTIST UNIVERSITY
FACULTY OF SCIENCE

Department of Computer Science Seminar
2008 Series

Towards Accurate and Efficient Classification: A Discriminative and Frequent Pattern-based Approach

Dr. Hong Cheng
University of Illinois at Urbana-Champaign

Date: February 26, 2008 (Tuesday)
Time: 9:30 - 10:30 am
Venue: RRS628, Sir Run Run Shaw Building, Ho Sin Hang Campus

Abstract
Classification is an essential theme widely studied in machine learning, statistics, and data mining. A lot of classification methods have been proposed in literature, most of which assume that the input data is in a feature vector representation. However, in many applications, it is desirable to construct accurate classification models on complex structural data which has no initial feature vector representation, including transactions, sequences, graphs, semi-structured data, and texts. A primary question is how to construct a discriminative and compact feature set, on the basis of which, classification could be performed directly. A concrete example is classifying chemical compounds to various classes (e.g., toxic vs. nontoxic, active vs. inactive). While simple features such as atoms and links are too simple to preserve the structural information, graph kernels make it hard to interpret the classifiers. My goal is to use discriminative frequent patterns to characterize complex structural data and thus enhance the classification power. Theoretical analysis is provided to justify the discriminative power of frequent patterns. Two efficient search strategies have also been designed to directly mine the most discriminative patterns. Based on these results, I developed a framework of discriminative frequent pattern-based classification which could lead to a highly accurate, efficient and interpretable classifier on complex data. The proposed pattern-based classification has been demonstrated useful in applications such as chemical compound classification, text categorization as well as software engineering.

Biography
Hong Cheng is currently a Ph.D. candidate in the Department of Computer Science, at University of Illinois at Urbana-Champaign. She got her M. Phil degree from Hong Kong University of Science and Technology in 2003 and B.S. degree from Zhejiang University in 2001, both in Computer Science. Her research interests include data mining, machine learning and database systems. She has published over 20 research papers in international conferences, journals and book chapter, including SIGKDD, SDM, VLDB, ICDE, ICDM, ACM Transactions on KDD, and Data Mining and Knowledge Discovery, and received research paper awards at ICDE’07, SIGKDD’06 and SIGKDD’05.

********* ALL INTERESTED ARE WELCOME ***********
(For enquiry, please contact Computer Science Department at 3411 2385)

http://www.comp.hkbu.edu.hk/v1/?page=seminars&id=19
Photos