HONG KONG BAPTIST UNIVERSITY
FACULTY OF SCIENCE

Department of Computer Science Seminar
2014 Series

Locality Sensitive Hashing for Big Data

Dr. Wei Wang
School of Computer Science and Engineering
The University of New South Wales
Australia

Date: October 21, 2014 (Tuesday)
Time: 11:30 am - 12:30 pm
Venue: SCT716, Cha Chi Ming Science Tower, Ho Sin Hang Campus

Abstract
Finding (approximate) nearest neighbor is a fundamental problem in computational geometry, and plays an important role in many database, multimedia, and machine learning problems, as real-world objects are usually modelled or represented as one or several high-dimensional feature vectors. The problem is notoriously hard due to the phenomenon named "curse of dimensionality", and it becomes even more challenging in the big data era, calling for new ideas and methodologies.

While there are numerous heuristic methods to solve this problem, methods based on Locality Sensitive Hashing (LSH) are arguably the most popular and promising approach as they have both rigorous theoretical guarantees on space and time complexities and excellent query performance in practice. However, to achieve such performance, LSH-based methods require huge amount of index. Recent work reduces the index size from O((nd)^{1.5}) to O(n log(n)) (n is the number of data points and d is the dimensionality), but the index is still typically an order of magnitude larger than the size of the data.

In this talk, we present our recent breakthrough in this problem. We propose a novel method to solve the problem with theoretical guarantees and a tiny index. Furthermore, our method supports a variety of functionalities, such as finding the exact nearest neighbors with guaranteed probabilities. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to a billion 128-dimensional points on a single commodity PC.

Biography
Dr. Wei Wang is an Associate Professor in the School of Computer Science and Engineering, The University of New South Wales, Australia. His current research interests include keyword search on (semi-)structured data, similarity query processing, high dimensional indexing, and spatial databases. He has published over ninety research papers, with many in premier database journals (TODS, VLDB J, and TKDE) and conferences (SIGMOD, VLDB, ICDE, and WWW). He is currently serving on the editorial boards of IEEE Transactions on Knowledge and Data Engineering (TKDE).

********* ALL INTERESTED ARE WELCOME ***********
(For enquiry, please contact Computer Science Department at 3411 2385)

http://www.comp.hkbu.edu.hk/v1/?page=seminars&id=306
Photos  Slides