HONG KONG BAPTIST UNIVERSITY
FACULTY OF SCIENCE

Department of Computer Science Colloquium
2011 Series

Machine Learning , Persistent Topology, and Circular Coordinates

Prof. Vin de Silva
Pomona College
California

Date: March 14, 2011 (Monday)
Time: 2:30 - 3:30 pm
Venue: RRS905, Sir Run Run Shaw Building, Ho Sin Hang Campus

Abstract
I will discuss work from the early 2000s in two different fields: NLDR (nonlinear dimensionality reduction) and PCT (point-cloud topology). Then I will try to show how these two fields meet, in the problem of finding circular coordinates for a data set. This newer work has possible applications in signal processing and dynamical systems.

The idea behind NLDR is to take a high-dimensional data set, perhaps obtained as a collection of scientific measurements, and to find a small set of real-valued coordinates that reveal meaningful parameters of the data. The classical linear instance of this is principal components analysis (PCA). The paradigm was introduced by Josh Tenenbaum in the late 1990s. Two well-known algorithms, Isomap (Tenenbaum, dS, Langford) and LLE (Roweis, Saul) were published in 2000, and many other researchers have published NLDR algorithms since then. Each algorithm exploits a different aspect of the inherent geometry of the data, in order to construct the coordinates.

Roughly over the same time period, several groups of researchers have been developing tools and techniques for applying algebraic topology to scientific data. Here the idea is to detect the topological structure of a set of high-dimensional observed data points. The difficulty is that data are inherently noisy, and topological invariants are extremely sensitive to local noise. The early breakthrough came in 2000, with the publication of the persistence algorithm of Edelsbrunner, Letscher and Zomorodian. This new framework gives robust versions of the classical invariants of algebraic topology (such as homology and betti numbers), that can be used to estimate the topology (or "shape") of a noisy data set.

In my talk, I will present recent work which combines ideas from both fields. From the NLDR side, one can generalize from real-valued coordinates to more general coordinates. We focus on circle-valued coordinates (such as angles). To discover these coordinates, we exploit not the geometry but the topology of the data. In order to do this robustly, it is necessary to use a persistence framework. I will indicate how these calculations are carried out, and give some examples of how one can exploit the resulting coordinates in applications.

My collaborators in this work are Mikael Vejdemo-Johansson, Dmitriy Morozov, and Primoz Skraba.

Biography
Vin de Silva studied mathematics at Cambridge and Oxford, completing a doctorate in symplectic geometry under the supervision of Simon Donaldson. Since 2000, he has worked in applied topology, spending five years working in Gunnar Carlsson's research group at Stanford. His work with Josh Tenenbaum and John Langford on the isomap algorithm is widely cited to this day, and his collaboration with Robert Ghrist on sensor network topology was honored by a SciAm50 award in 2007. Vin is currently an assistant professor of mathematics at Pomona College, California, and holds a Digiteo Chair at INRIA Saclay Ile-de-France.

********* ALL INTERESTED ARE WELCOME ***********
(For enquiry, please contact Computer Science Department at 3411 2385)

http://www.comp.hkbu.edu.hk/v1/?page=seminars&id=168&lang=sc
Photos  Slides