HONG KONG BAPTIST UNIVERSITY
FACULTY OF SCIENCE

Department of Computer Science Seminar
2015 Series

On the Curses and Blessings of High Dimensionality in Data Mining

Dr. Ata Kaban
Senior Lecturer
University of Birmingham
UK

Date: November 19, 2015 (Thursday)
Time: 2:30 - 3:30 pm
Venue: CEC803, Christian Education Centre, Ho Sin Hang Campus

Abstract
High dimensional data spaces have some counter-intuitive, but very general properties. For instance, distance concentration is the phenomenon that the contrast between the nearest and the farthest neighbouring points may vanish as the data dimensionality increases. This was flagged up in cancer research, in database research, and data mining, as a threatening factor that may compromise high dimensional data processing and analysis, which all typically rely on some notion of distance or dissimilarity. Related issues include estimation problems when the data dimension is larger than the sample size, as well as computational issues when dealing with large data sets.

This talk will present a selection of results about high dimensional data mining that aim to provide answers to the following questions:
(i) How can we detect problems of distance concentration in high dimensional data sets?
(ii) Which methods are suitable to guard against the ill effects of distance concentration?
(iii) When and how can we turn around the generic phenomenon of concentration of norms and distances to perform computationally cheap random dimensionality reduction for high dimensional learning problems, and what performance can we guarantee for this scheme?

The latter turns out to depend on a combination of the complexities of the input space and that of the learning model. This goes well beyond the previously coined notion of compressive learning that assumed a one-size-fits-all condition of sparse representation of the inputs. We will discuss two extremes on this spectrum by way of demonstration: a constrained linear model and an unconstrained nearest-neighbour rule based model.

Biography
Dr. Ata Kaban is a senior lecturer in Computer Science at the University of Birmingham, UK. She holds a PhD in Computer Science (2001) and a PhD in Musicology (1999). Her current research interests include statistical machine learning and data mining in high dimensional data spaces, algorithmic learning theory, probabilistic modelling of data, and Bayesian inference. She is currently a vice-chair of the IEEE CIS Task Force on High Dimensional Data Mining.

********* ALL INTERESTED ARE WELCOME ***********
(For enquiry, please contact Computer Science Department at 3411 2385)

http://www.comp.hkbu.edu.hk/v1/?page=seminars&id=354
Photos  Slides