Effective and Efficient Cross-modal Retrieval (Anran Wang et al.)

Our project focuses on a fundamental and challenging task dealing with data from multiple modalities. There are different kinds of data such as texts, images, and videos describing the same content on the Internet. To make it possible to interact with large-scale multi-modal data, it is important to support users to retrieve related information with data at hand. This task is called cross-modal retrieval. To be more specific, we focus on visual-semantic retrieval task which retrieves related text information given an image as query and vice versa. We attempt to develop an effective and efficient visual-semantic retrieval system. This work is motivated by the idea that low-level information contributes in many tasks, e.g. scene classification, action recognition. To bridge the semantic gap between multiple modalities, low-level information could be included to exploit underlying connection between heterogeneous data. The proposed project will investigate how to involve low-level information to enhance the retrieval quality. In addition, since we need to deal with data in a large corpus, this projects also targets to make the retrieval process efficient.


For further information on this research topic, please contact Dr. Anran Wang.