Principles of Stochastic Discrimination and Ensemble Learning


Tin Kam Ho,
Mathematical and Algorithmic Sciences Research Center,
Bell Labs, Lucent Technologies.
600 Mountain Ave., 2C-279, Murray Hill, NJ 07974, USA.
phone: 1-(908) 582-5989 fax: 1-(908) 582-5857
tkh@research.bell-labs.com
http://www.cs.bell-labs.com/who/tkh

Overview

Learning in everyday life is often accomplished by making many random guesses and synthesizing the feedback. Kleinberg's analysis of this process resulted in a new method for classifier design -- stochastic discrimination (SD). The method constructs an accurate classifier by combining a large number of very weak discriminators that are generated essentially at random. An important advantage is that classifiers designed by this way are insensitive to overtraining.

SD is an ensemble learning method in an extreme form. Studies on other ensemble methods for classification have long suffered from the difficulty of modeling the complementary strengths of the components. The SD theory addresses this rigorously via mathematical notions of enrichment, uniformity, and projectability.

In this tutorial we explain these concepts via a simple numerical example,with a focus on a fundamental symmetry in point set covering that is the key observation leading to the foundation of the SD theory. We illustrate, step by step, how the SD principle operates in this example. We then describe and discuss more sophisticated implementations for practical uses. We believe a basic understanding of the SD method will open the way to explorations of a new classifier technology, and lead to developments of better tools for analyzing other ensemble methods.

No prior knowledge is assumed other than a general understanding about major concerns in classification. Though, some experience on ensemble methods, such as classifier combination, bagging, boosting, etc. will be helpful in discussions on how the SD method relates to others.

Outline

Part I: The Stochastic Discrimination principles.

  • Symmetry of probabilities induced by uniform space covering.

  • Two-class discrimination.

  • Projectability of weak models.

Part II: Practical Implementations.

  • Direct implementation of Stochastic Discrimination, details of an example implementation, results with public data sets, open research questions.

  • Derived methods, structured stochastic discrimination, decision forests.

  • Relationship to other ensemble learning methods including boosting and genetic algorithms.

Biography

Dr. Ho is a Distinguished Member of Technical Staff in the Mathematical and Algorithmic Sciences Research Center of Bell Laboratories. She received a PhD in Computer Science from State University of New York at Buffalo in 1992. She has published actively on classifier combination, decision forests, stochastic discrimination, data complexity, and many topics in pattern recognition applications. Her 1992 Ph.D. thesis was a pioneering study on classifier combination. She is Editor-In-Chief of the IAPR official journal Pattern Recognition Letters . She has also been on the editorial board of Pattern Recognition (1999-present), International Journal on Document Analysis and Recognition (2003-present), and IEEE-Transactions on PAMI (1999-2002). She is a Fellow of the IAPR and IEEE, and has 7 U.S. patents on classifier design, image analysis, and wireless tracking.

Dr. Ho has been a close collaborator with Prof. Kleinberg on SD since 1992. She also developed several variants of SD implementations, and studied relationship of SD to classifier combination and ensemble learning methodologies. She has published several articles related to SD, on its introduction, experiments, and variants.