Vector Similarity Search in High-Dimensional Spaces (Jianliang Xu et al.)


Vector similarity search plays a crucial role in various data science and AI applications, such as natural language understanding, recommendation systems, image/video processing, and anomaly detection, among others. Particularly with the emergence of ChatGPT and other generative AI technologies, the demand for vector similarity search in high-dimensional spaces has intensified. However, conducting efficient and accurate similarity search in high-dimensional vector spaces is challenging due to the curse of dimensionality.

Despite recent advancements, existing vector similarity search methods have several limitations. Firstly, state-of-the-art techniques suffer from excessively large search time complexities, with a factor of 2^m (where m is the dimensionality of vectors), making them inefficient for practical applications. Moreover, methods for vector similarity search in multi-metric spaces often lack accuracy guarantees or suffer from inefficiencies. Additionally, current approaches are memory-intensive and I/O-unfriendly, resulting in significant costs when dealing with large-scale databases that are stored on external storage.

To address these limitations, this research proposal aims to investigate novel approaches to vector similarity search in high-dimensional vector spaces. Specifically, we propose three research tasks to tackle the research questions. First, we aim to reduce the search time complexity by developing a novel ball-cover-based proximity graph (PG) for indexing vector data and designing scalable PGs with respect to different data expansion rates. Second, we will propose a new margin-allowed proximity graph (MPG) framework to efficiently support multi-vector similarity search while ensuring accurate search results. Third, we will explore a specifically designed hierarchical PG framework to minimize I/O costs and enable efficient vector similarity search in large-scale databases.


Figure 1: A Toy Example of Graph-based Vector Similarity Search


Publications:


Grant Support:

The project is supported by the Hong Kong Research Grant Council under the General Research Fund 12202024.


For further information on this research topic, please contact Prof. Jianliang Xu.