Novel Queries and Applications on Large Graphs (Byron Choi et al.)

Many applications of graph-structured data (or large-scale networks) are emerging in industry and science, for example; (i) biological graphs such as molecules, chemicals, and genes; (ii) social graphs including friendships on Facebook; and (iii) topologies such as the web graph. Novel queries arising from this can be to determine the existence of a given pattern in a data graph. For example, chemists may search for polymers in food that contain benzopyrene (a carcinogenic substance). Another example is that an activity organizer may determine a group from an evolving social network whose members know some other members in the group. Due to the hardness of graph patterns, even powerful servers may take a long time to process them. Therefore, the project aims at optimizing novel queries on graphs and supporting emerging applications on graphs.


Objectives:


Findings So Far:

To protect the privacy of query graphs and/or data graphs, specifically the topological structure (i.e., the adjacency matrices) of the graphs, we proposed efficient privacy-preserving subgraph isomorphism query [1, 3]. In particular, novel encryption schemes and algorithms were carefully designed by adopting classical cryptographic schemes and seminal query evaluation algorithms. Optimization strategies have been proposed to enhance performance. Moreover, the privacy analysis shows the security guarantees. Our experimental results show that the proposed techniques are efficient and the optimizations are effective.

To ensure data integrity, we proposed efficient authenticated subgraph isomorphism and similarity queries [2, 5], to guarantee the correctness (soundness and the completeness) of query answers. Specifically, we proposed several novel methods derived from classical verifiable data structures, in particular, Merkle Hash Tree (MHT). Several indexing techniques and properties are proposed to optimize authentication performance. Theoretical analysis and experiments show that the proposed algorithms guarantee data integrity and return the query answers correctly and efficiently.

Regarding evolving graph data, we have studied the incremental maintenance problem of the minimum bisimulation of evolving graphs (and structural index for graphs) [4]. Bisimulation has been shown an efficient index for many structural queries such as reachability queries, path queries, twig queries and graph pattern queries. The main idea of our work is to take advantage of the most efficient maintenance algorithms for acyclic structures and cyclic structures, respectively. The former algorithms are more time efficient while the latter ones produce small indexes. Usability has been a pressing issue of graph repositories, because graph schemas may be too loose to be useful for query formulation; the queries can be too complex to compose manually; or the user may simply fall victim to human error. Therefore, a visual interface for graph repositories is needed [6]. We showed that a visual interface opens new opportunities for query optimizations. In particular, we process queries as the users are drawing them [7, 8]. ,

Usability has been a pressing issue of graph repositories, because graph schemas may be too loose to be useful for query formulation; the queries can be too complex to compose manually; or the user may simply fall victim to human error. Therefore, a visual interface for graph repositories is needed [6]. We showed that a visual interface opens new opportunities for query optimizations. In particular, we process queries as the users are drawing them [7, 8].


Selected Publications:

[1]  Zhe Fan, Byron Choi, Qian Chen, Jianliang Xu, Haibo Hu and Sourav S. Bhowmick. Structure-Preserving Subgraph Query Services. IEEE Transactions on Knowledge and Data Engineering (TKDE), 27(8):2275-2290, 2015.

[2]  Yun Peng, Zhe Fan, Byron Choi, Jianliang Xu and Sourav S. Bhowmick. Authenticated Subgraph Similarity Search in Outsourced Graph Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(7):1838-1860, 2015.

[3]  Zhe Fan, Byron Choi, Jianliang Xu and Sourav S. Bhowmick. Asymmetric Structure-Preserving Subgraph Query for Large Graphs. In Proeedings of the 31st IEEE International Conference on Data Engineering (ICDE), 339-350, 2015.

[4]  Jintian Deng, Byron Choi, Jianliang Xu, Haibo Hu and Sourav S. Bhowmick. Incremental Maintenance of the Minimum Bisimulation of Cyclic Graphs. IEEE Transactions on Knowledge and Data Engineering (TKDE) 25(11):2536-2550, 2013.

[5]  Zhe Fan, Yun Peng, Byron Choi, Jianliang Xu and Sourav S. Bhowmick. Towards Efficient Authenticated Subgraph Query Service in Outsourced Graph Databases. IEEE Transactions on Services Computing (TSC) 7(4):696-713, 2014.

[6]  Sourav S. Bhowmick, Byron Choi and Shuigeng Zhou. VOGUE: Towards a Visual Interaction-aware Graph Query Processing Framework. The biennial Conference on Innovative Data Systems Research (CIDR '13), 2013.

[7]  Changjiu Jin, Sourav S. Bhowmick, Byron Choi, and Shuigeng Zhou. PRAGUE: Towards Blending Practical Visual Subgraph Query Formulation and Query Processing. In Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE '12), pages 222-233, 2012.

[8]  Changjiu Jin, Sourav S. Bhowmick, Xiaokui Xiao, James Cheng and Byron Choi. GBLENDER: Towards Blending Visual Query Formulation and Query Processing in Graph Databases. In Proceedings of the 29th ACM SIGMOD International Conference on Management of Data, pages 111 – 122, 2010.


Grant Support:

This project is supported by the Research Grants Council (RGC), Hong Kong SAR, China


For further information on this research topic, please contact Dr. Byron Choi.