When Learned Index Meets Blockchain: Design, Algorithms, and Performance Evaluation (Jianliang Xu et al.)


Blockchain has recently become increasingly widely adopted for many decentralized applications. It enables otherwise unknown peers to collectively maintain a verifiable distributed database. However, to preserve integrity and support data provenance, every blockchain node must store and maintain entire ledger states, which incurs high storage costs and prolongs data search time.

This project aims to explore emerging learned index technologies to optimize blockchain system performance. Recent studies on learned indexes have shown their advantage over traditional indexes such as B+-tree and R-tree. At its core, a learned index replaces the directing keys in each index node with a learned model to improve search efficiency and reduce storage overhead. However, we cannot apply the existing learned indexes directly to blockchain systems because of the following challenges. First, the existing learned indexes do not support data authentication and provenance, which are essential to blockchain systems. Second, blockchain uses long hash strings as indexing keys, which are different from the numerical keys used by the existing learned indexes. Third, the existing learned indexes focus on read-optimized, in-memory databases, whereas blockchain systems feature disk-based storage and frequent status updates.

To address these challenges, we propose a novel column-based Merkle learned index for blockchain systems. More specifically, we design (1) a two-level column-based Merkle index for supporting efficient data authentication and provenance; (2) two specifically-designed learned structures and models that are tailored for the two levels of indexes; and (3) a multivariate linear regression model to learn the distribution of hash string keys. Besides formulating the basic design, we also investigate efficient algorithms for search within and maintenance of the proposed learned index as well as to develop several optimization techniques that strike a balance between storage and search performance.


Figure 1: A Column-based Design for Learned Blockchain Storage


Publications:


Grant Support:

The project is supported by the Hong Kong Research Grant Council under the General Research Fund 12200022.


For further information on this research topic, please contact Prof. Jianliang Xu.