Pattern Pruning, Pattern Clustering and Summarization

Dr. Andrew K.C. Wong
Department of Systems Design Engineering
University of Waterloo

Date: June 10, 2008 (Tuesday)
Time: 11:00 am - 12:00 pm
Venue: OEE601, Oen Hall (East Wing), Ho Sin Hang Campus

Mining patterns such as itemsets, association rules and association patterns is important in knowledge discovery and data mining. However, the number of discovered patterns or rules is usually overwhelming. Hence, we need new technologies to prune, cluster, summarize the discovered patterns/rules for effective interpretation, analysis and visualization. To achieve the pattern analysis tasks, the distance measure between patterns is most basic and crucial. Contemporary pattern distances are based on counting the number of samples where patterns match or mismatch. They are inadequate to render effective solutions. This talk will first give a brief analysis of sample-matching distances between patterns for categorical data. It then presents three new distance measures between pattern s (pattern clusters). Using the new distance measures, we have developed new methods for: a) pattern pruning, b) pattern clustering and c) pattern summarization. Pattern pruning prunes redundant patterns. In addition to closed itemset and maximal itemset pruning, we have developed a generalized itemset pruning technique which is able to allow tradeoff between the number of itemsets preserved and the information loss during pruning. In pattern clustering, a dual process which simultaneously clusters patterns and their associated data is developed. When patterns are grouped into a pattern cluster, their associated data will be grouped into a data grouping. Then both the statistical and probabilistic aspects of information are used in the clustering process. To further reduce the number of patterns, a new pattern summarization process which is able to select a few most representative patterns to serve as summary patterns for each cluster is introduced. In the talk, the theoretical aspect and analytical details of these methods will be presented, together with extensive experimental results on synthetic and real world data in comparison with other contemporary methods.

Dr. Wong holds a Ph.D. from Carnegie Mellon University; B.Sc (Hons) and M.Sc. from the Hong Kong University. He is an IEEE Fellow for his contribution in machine intelligence, computer vision, and intelligent robotics. Currently, Dr. Wong is a Distinguished Professor Emeritus (Systems Design Engineering), adjunct also to the School of Computer Sciences and the Electrical and Computer Engineering at the University of Waterloo (UW). He was the Founding Director of Pattern Analysis and Machine Intelligence Laboratory (PAMI Lab) at UW, a Distinguished Chair Professor at the Hong Kong Polytechnic University (00-03) and an Honorable Professor, Electronic Engineering, University of Hull. His research areas cover machine intelligence, computer vision, intelligence robotics, pattern recognition, data mining and bioinformatics. Dr. Wong has published over hundred journal papers and close to two hundred conference papers and book chapters; and is holding five US Patents. He served as the General Chair of IASTED Conference of Robotics and Control, 1996 and General Chair of Intelligent Robotics and Systems Conference 1998. Dr. Wong has been an invited speaker in the Distinguished Speaker Program, IEEE Computer Society, 1994-1996 and a Distinguished Lecturer in the Special Lecture Series on Computer Vision at the School for advanced Studies in Industrial and Applied Mathematics (SASIAM), Technopolis, Bari, Italy. In the industrial sectors, Dr. Wong has served as consultant to many high-tech companies in USA, Canada and Hong Kong. Based on the core technologies he and his team have developed, three high tech companies were founded. Dr. Wong is a founder, and retired director of Virtek Vision International Corporation, a publicly traded company and a leader in laser and vision technology. He was its president (86-93) and Chairman (93-97), director (97-03). In 1997, he co-founded Pattern Discovery Software Systems Ltd. and has served as Chairman ever since.

