HONG KONG BAPTIST UNIVERSITY
FACULTY OF SCIENCE

Department of Computer Science Colloquium
2005 Series

False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams

Dr. Jeffrey Xu YU
The Chinese University of Hong Kong

Date: February 2, 2005 (Wednesday)
Time: 2:00 - 3:00 pm
Venue: SCT909, Cha Chi Ming Science Tower, Ho Sin Hang Campus

Abstract
The problem of finding frequent items has been recently studied over high speed data streams. However, mining frequent itemsets from transactional data streams has not been well addressed yet in terms of its bounds of memory consumption. The main difficulty is due to the nature of the exponential explosion of itemsets. Given a domain of I unique items, the possible number of itemsets can be up to 2**I-1. When the length of data streams approaches to a very large number N, the possibility of an itemset to be frequent becomes larger and difficult to track with limited memory. However, the real killer of effective frequent itemset mining is that most of existing algorithms are false-positive oriented. That is, they control memory consumption in the counting processes by an error parameter e, and allow items with support below the specified minimum support s but above s-e counted as frequent ones. Such false-positive items increase the number of false-positive frequent itemsets exponentially, which may make the problem computationally intractable with bounded memory consumption. In this paper, we developed algorithms that can effectively mine frequent item(set)s from high speed transactional data streams with a bound of memory consumption. While our algorithms are false-negative oriented, that is, certain frequent itemsets may not appear in the results, the number of false-negative itemsets can be controlled by a predefined parameter so that desired recall rate of frequent itemsets can be guaranteed. We developed algorithms based on Chernoff bound. Our extensive experimental studies show that the proposed algorithms have high accuracy, require less memory, and consume less CPU time. They significantly outperform the existing false-positive algorithms.

Biography
Jeffrey Xu Yu received his B.E., M.E. and Ph.D. in computer science,from the University of Tsukuba, Japan, in 1985, 1987 and 1990, respectively. Jeffrey Xu Yu was a research fellow in the Institute of Information Sciences and Electronics, University of Tsukuba (Apr. 1990 -- Mar. 1991), and held teaching positions in the Institute of Information Sciences and Electronics, University of Tsukuba (Apr. 1991 -- July 1992) and in the Department of Computer Science, Australian National University (July 1992 -- June 2000). Currently he is an Associate Professor in the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. His major research interests include data mining, data stream mining/processing, XML query processing and optimization, data warehouse, on-line analytical processing, and design and implementation of database management systems. He is an Associate Editor of IEEE TKDE. He severed in organization committees and program committees including a co-chair of program committee of the 6th Asia Pacific Web Conference (APWeb'04), a co-chair of the organization committee of the 5th International Conference on Optimization: Techniques and Applications (ICOTA'01), a regional coordinator (Far East and Australia) of 2003 Web Information Systems Engineering (WISE'03), and a Program Committee Vice-Chair of the 5th IEEE International Conference on Data Mining (ICDM'05).

********* ALL INTERESTED ARE WELCOME ***********
(For enquiry, please contact Computer Science Department at 3411 2385)

http://www.comp.hkbu.edu.hk/v1/?id=85&page=seminars