Uncovering Spatiotemporal Patterns of Disease Diffusion through Data-Driven Phylogeographic Inference (Jiming Liu et al.)

The geographical spread of infectious diseases has been and will continue to be a serious public health concern in both Hong Kong and mainland China. Typical examples of these diseases include the influenza A(H5N1) outbreak in Hong Kong in 1997, the swine flu (H1N1) pandemic in Hong Kong in 2009, and the influenza A(H7N9) epidemic in Eastern China in 2013.

Computationally, disease diffusion networks can be used to characterize how these diseases spread from one geographic location to another. In these networks, nodes represent locations and edges represent the diffusion dynamics between the locations. From a particular disease diffusion network, we can identify underlying disease hotspots and critical diffusion paths, information that can help public health authorities to achieve active surveillance and efficient control of the disease. However, in practice, such networks are often hidden; we can observe only the locations and times or genome sequences of disease incidences. In this project, we will develop and evaluate an integrated Bayesian inference approach to uncovering real-world disease diffusion networks. The uniqueness of this approach is that it mines and incorporates informative priors from heterogeneous data sources. Examples of such priors include the spatiotemporal distribution of the disease hosts and the evolutionary relationships of the viral sequences.

Taking the influenza A(H5N1) viruses in China as a case study, this project contributes to the interdisciplinary field of phylogeographic inference (that is aimed to reveal the geographic spread of disease viruses based on their evolutionary relationships, i.e., phylogenetic trees). Firstly, we propose a Bayesian inference approach to mapping disease diffusion networks based on a reconstructed phylogenetic tree. Secondly, we develop a novel clustering method to characterize spatiotemporal distributions of disease hosts from sparse and biased observation data to avoid the general network inference problem of estimating a large number of free parameters (i.e., n(n-1) unknown diffusion rates for n locations). Thirdly, we design novel Markov chain Monte Carlo algorithms to account for the uncertainties in generating phylogenetic trees from the genome sequences of disease viruses and evaluate the proposed methods/algorithms by comparing them with those from existing studies, using both synthetic and real-world datasets.

As far as we know, this project is one of the first attempts to incorporate spatial ecology and viral evolution in a phylogeographic inference study. Such computationally obtained results can offer new insights into investigating diffusion networks of other diseases, whose geographic diffusion is caused by host mobility and migration.


Grant Support:

This project is supported by the Research Grants Council (RGC), Hong Kong SAR, China (Project HKBU12202415).


For further information on this research topic, please contact Prof. Jiming Liu.