Advanced Computational Tools Tackle Complex Diseases and Speed up Drug Discovery

Dr. Eric Zhang

The state-of-the-art technologies in computer science, from artificial intelligence (AI) to deep learning, have been making significant impact on the medical and healthcare field, helping biologists and doctors to drastically reduce the time required to find the answers to different diseases, such as cancers and inherited diseases.

Dr. Eric Zhang, Assistant Professor of the Department of Computer Science at HKBU, is a specialist in developing computational tools and novel deep learning models to understand human gut microbiome and human genome based on advanced high throughput sequencing.

“High throughput sequencing technology is used to generate data from genomes, transcriptomes and epigenomics. For example, about 100GB or even more raw data can be generated for human genome sequencing. Therefore, how to analyse and extract useful knowledge from these massive data has become a practical challenge,” says Dr. Zhang, who received his MPhil degree from Li Ka Shing Faculty of Medicine at The University of Hong Kong and the PhD degree in Computer Science from City University of Hong Kong in 2012 and 2016 respectively. “I want to develop novel deep learning algorithms as well as powerful tools to help the biologists or doctors to extract useful information from a huge quantity of data.”


Decipher the Relationship between Human Genome and Diseases

Dr. Zhang and his research team are currently working on deep learning in genomics and applying advanced computational models to decipher the relationship between human genome/human gut microbiome and diseases/drug effectiveness.

As the human genome holds the key to understand the genetic basis of human evolution, hereditary illness and many phenotypes, whole-genome reconstruction, variant discovery and analysis of data from advanced sequencing experiments are foundational for the study of human genomic variation and analysis of genotype-disease relationships. By developing a series of computational tools, Dr. Zhang and his team aim to identify the haplotype-resolved genomic variants and predict their pathogenicity to human complex diseases, such as cancer and inherited diseases.

In one of Dr. Zhang’s previous research projects, De novo coding mutations contribute to early onset Parkinson's disease (PD), he and his team used computational and statistical analysis to identify 12 genes with de novo mutations — mutations that appear in an individual but not being seen in his/her parents, and gene NUS1 as a candidate gene for early onset PD. “We also replicated the signals of NUS1 in a large cohort, and found that they may only exist in Chinese because previous studies Europe and America didn’t find such signals,” says Zhang, who was a postdoctoral fellow in Department of Computer Science and Pathology at Stanford University, supervised by computer scientist Professor Serafim Batzoglou and geneticist Professor Arend Sidow.

Fig 1
Data analysis workflow and the protein–protein interaction networks between candidate and known PD genes. (A) The workflow for identifying PD candidate genes. (B) Protein–protein interaction networks between 12 candidate genes and known PD causative genes.

Last year, his team developed a deep learning algorithm, DeepDrim, to reconstruct gene regulatory network from single cell gene expression data and applied it to identify the molecular mechanisms of COVID-19. The algorithm adopts convolutional neural networks to transform expression of gene pairs into images and eliminate the false positives by removing the transitive gene-gene interactions. “We found the gene connections in patients with severe COVID-19 is much stronger than those in patients with mild illness. We also identified some connections between genes only exists in patients with severe COVID-19 and their functions have been proved to be related to the disease” he says. “With single cell sequencing technology, we found out some gene functional modules differences between patients with severe and mild COVID-19.”

Fig 2
Overview of DeepDRIM. A. Representation of the joint gene expression of gene a and gene b as a primary image. B. The 2n + 2 neighbor images are generated from the genes with strong positive covariance with gene a or gene b. C. The network architecture of DeepDRIM, including Network A and Network B, which are two stacked convolutional embedding structures designed to process the primary and neighbor images, respectively.


Metagenomic Sequencing and its Application to Chinese Medicine

Another research focus of Dr. Zhang is applying metagenomic sequencing technology on Chinese herbal medicine. “Herbal medicine has a problem: it does not have an explicit molecular target and it has an unclear mechanism. Recent studies revealed that many herbal medicine targets human gut microbes,” says Dr. Zhang. “If we can understand how Chinese herbal medicine affects the human body, whether it kills some bad materials and just keeps the beneficial materials, we can see how the true effective factor influences the diseases.”

With an interdisciplinary background in genetics, statistics and computer science, Dr. Zhang aspires to improve human health with his research efforts — this has been his dream since he was a child. “I love the healthcare field very much. For now, maybe we can do very few to tackle diseases such as cancer. But I think the development of biotechnology and interdisciplinary research using computational technology will help us understand the diseases and solve the problems,” says Dr. Zhang.

“The biologists and doctors always generate loads of data every day, they need someone that understands both computational technology and their problems. In this field, there are very few people can do that. I want to find a way to see how advanced computational technology can help the biological field and analysis.”


Novel Algorithms for Drug Discovery and Precision Medicine Implementation

In the future, Dr. Zhang would like to shift more focus on drug discovery and the implementation of precision medicine for both Chinese and western medicine. “When we want to develop a drug, the first step is that we should know the target. I want to use advanced biotechnology and novel algorithms to identify the drug targets more efficiently from high throughput sequencing data to fix the problem,” says Dr. Zhang.

In Chinese medicine, there is a saying “different diseases have the same treatment”. Dr. Zhang is trying to find out the secrets behind by studying the effects and mechanism of action of Chinese herbal medicine, facilitating the optimization of traditional formulas and exploration of novel ones.

For western medicine, AI technology is used to accelerate the whole procedure of drug discovery, even for the vaccine designed for different variants of COVID-19 and emerging diseases. “It can bring significant impact on healthcare around the world in the future,” Dr. Zhang believes.