Eric Lu Zhang
Department of Computer Science
Hong Kong Baptist University
Kowloon Tong, Hong Kong
ericluzhang AT comp DOT hkbu DOT edu DOT hk
Phone: (852) 3411-5880
Fax: (852) 3411-7892
Office: Room R708, Sir Run Run Shaw Building
I received B.E. in Software Engineering from the
Tianjin University ,
Tianjin, P.R.China in July 2008, M.phil in Bioinformatics from The University of Hong Kong in Sept. 2012
and Ph.D. in Computer Science from the City University of Hong Kong in Feb. 2016.
During June 2016 - Jan. 2019, I was a postdoctoral scholar at
Department of Computer Science and Pathology,
Stanford University, Stanford, USA and worked with Serafim Batzoglou , Arend Sidow and James Zou.
I joined Hong Kong Baptist University as Assistant Professor of Computer Science in
Feb. 2019, and I was a visiting student at Department of Mathmatics,
and worked with Stephen Smale .
For a complete list of my publications and other detailed info, please see my
Google Scholar page
I am looking for talented and self-motivated PhD students, Postdocs and Research Assistant Professors who are interested in
computational genomics and machine learning in genomics.
Please contact me by email with your CV and research proposal.
- Computational Genomics
The human genome holds the key for understanding the genetic basis of human evolution,
hereditary illnesses and many phenotypes. Whole-genome reconstruction and variant discovery,
accomplished by analysis of data from whole-genome sequencing experiments,
are foundational for the study of human genomic variation and analysis of genotype-phenotype
relationships. Over the past decades, cost-effective whole-genome sequencing has been revolutionized
by short-fragment approaches, the most widespread of which have been the consistently improving
generations of the original Solexa technology, now referred to as Illumina sequencing.
An alternative approach is offered by the 10x Genomics Chromium system and stLFR from BGI, which distributes the
DNA prep into millions of partitions or beads where specific barcode sequences are attached to
short amplification products that are templated off the input fragments. But there lacks efficient
software to handle this recently emergent technology and make full use of DNA long-range information.
We aim to develop a series of computational tools to analyze linked-reads data,
including read alignment, de novo assembly, variant detection, evolutionary analysis et al.
We believe our contribution can move us one step further to make precision medicine into reality.
Besides human genome analysis, we also design algorithms to decipher metagenome from
linked-read and long-read sequencing.
- Disease Risk Prediction
A key public health challenge is to identify individuals at
high-risk for common diseases in order to enable prescreening or preventive therapies.
Much effort has been made in identifying disease causal genomic variants and evaluating
their contribution in disease prediction. Unlike single gene diseases that are usually
caused by inherited monogenic mutations, common diseases have multifactorial etiologies
that involve the interplay of both genetic and non-genetic factors. Therefore, how to
effectively identify high-risk incident cases from "multi-level" information are
core goals for precision medicine. We aims to develop machine learning (especially deep learning)
algorithms on integrating genomic data, clinical images, clinical records and
lifestyle information to predict human complex diseases.
- Deep Learing in Genomics
The missing heritability of human complex diseases is a critical problem in biomedical research.
Traditional statistical approaches are unable to make full use of the accumulated genomic big data and may miss complex, nonlinear relationships of risk factors. We aim to develop novel deep learning methods by integrating trans-omics big data to explore the "dark regions" in genomic studies.
Single-cell RNA sequencing is an emerging technology which provides us the opportunity to observe the gene expression profiles at single-cell resolution. Analysing scRNA-seq data can help us understand the heterogeneity of different cell types and capture cell differentiation naturally.
We aim to design and apply modern machine learning approaches to address a series of computational problems in scRNA-seq.
I am teaching one undergraduate and one graduate courses in 2019-2020 and 2020-2021 academic year.
- COMP 1007 : Introduction to Python and Its Applications (2020 Autumn).
- COMP 7990 : Principles and Practices of Data Analytics (2020 Autumn).
- COMP 1007 : Introduction to Python and Its Applications (2019 Autumn).
- COMP 7990 : Principles and Practices of Data Analytics (2019 Autumn).
4-5pm every Friday in DLB641, but please always feel free to come by.
If you are a HKBU graduate or undergraduate student interested in machine learning or computational genomics, please feel free to email me about potential projects.
Selected Publications (+joint first author,*corresponding author,)
Debajyoti Chowdhury, Maizie (Xin) Zhou, Bailiang Li, Yuanwei Zhang, William K Cheung, Aiping Lu, Lu Zhang*. Multi-omics integration accelerates the predictive health to augment early diagnosis of common diseases. Accepted by Frontiers in Genetics, section Computational Genomics.
Md Selim Reza, Yunpeng Cai, Lu Zhang, Xingyu Zhang, Yanjie Wei. Computational Solutions for Microbiome and Metagenomics Sequencing Analyses. Accepted by Frontiers in Molecular Biosciences, section Molecular Diagnostics and Therapeutics
Zi-Hang Wen, Jeremy L. Langsam, Lu Zhang, Wenjun Shen, Xin Zhou. Bfimpute: A Bayesian factorization method to recover single-cell RNA sequencing data. BioRxiv.
Yichen Henry Liu, Griffin L. Grubbs, Lu Zhang, Xiaodong Fang, David L. Dill, Arend Sidow, Xin Zhou. Aquila_stLFR: diploid genome assembly based structural variant calling package for stLFR linked-read. Bioinformatics Advances 2021. https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbab007/6300508
Jiaxing Chen, Chinwang Cheong, Liang Lan, Xin Zhou, Jiming Liu, Aiping Lyu, William K Cheung*, Lu Zhang*. DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-Seq Data. Presented on Recomb-Seq 2021 and the extended version will be published on Briefings in Bioinformatics
Zhenmiao Zhang, Lu Zhang*. METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs. Invited to BMC Bioinformatics-APBC Special Issue and Accepted to APBC 2021. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04284-4
Xin Zhou, Lu Zhang, Ziming Weng, David Dill, Arend Sidow. Aquila: diploid personal genome assembly and comprehensive variant detection based on linked reads. Nature Communications 2021. https://www.nature.com/articles/s41467-021-21395-x
Lu Zhang+*, Xiaodong Fang+, Herui Liao+, Zhenmiao Zhang, Xin Zhou, Lijuan Han, Yang Chen, Qinwei Qiu, Shuai Cheng Li. A Comprehensive Investigation of Metagenome Assembly by Linked-Read Sequencing. Microbiome 2020. https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-020-00929-3
Lu Zhang+, Xin Zhou+, Ziming Weng, Arend Sidow. De novo diploid genome assembly for genome-wide structural variant detection. NAR Genomics and Bioinformatics 2020. https://academic.oup.com/nargab/article/2/1/lqz018/5661105
Lu Zhang+, Xin Zhou+, Ziming Weng, Arend Sidow. Assessment of human diploid genome assembly with 10x Linked-Reads data. GigaScience 2019. https://academic.oup.com/gigascience/article/8/11/giz141/5643883
JiFeng Guo+, Lu Zhang+ et al. De novo coding mutations contribute to early onset Parkinson's disease. Proceedings of the National Academy of Sciences of the United States of America 2018. https://www.pnas.org/content/115/45/11567
Xin Zhou, Serafim Batzoglou, Arend Sidow, Lu Zhang*. HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data. BMC Genomics 2018. https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4867-7
Lu Liu+, Lu Zhang+ et al. The SNP-set based association study identifies ITGA1 as a susceptiblity gene of attention-deficit/hyperactivity disorder in Han Chinese. Translational Psychatary 2017. https://www.nature.com/articles/tp2017156
Lu Zhang+, Cheng Qin+, Junpu Mei, et al. Identification of microRNA Targets of Capsicum spp. using MiRTrans—a trans-omics approach. Frontiers in Bioinformatics and Computational Biology 2017. https://www.frontiersin.org/articles/10.3389/fpls.2017.00495/full
Lu Zhang, Xikang Feng, Yen Kaow Ng and Shuai Cheng Li. Reconstructing directed gene regulatory network by only gene expression data. Invited to BMC Genomics-BIBM Special Issue and Accepted to BIBM 2015. https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2791-2
Xueyan Li+, Dingding Fan+, Wei Zhang+, Guichun Liu+, Lu Zhang+, et al. Outbred genome sequencing and CRISPR/Cas9 gene editing in butterflies. Nature Communications 2015. https://www.nature.com/articles/ncomms9212
Jing Zhang+, Lu Zhang+, Jiaxu Hong, Dan Wu, Jianjiang Xu. Association of Common Variants in LOX with Keratoconus: A Meta-Analysis. PLOS One 2015. https://pubmed.ncbi.nlm.nih.gov/26713757/
Huashui Ai, Xiaodong Fang, Bin Yang, Zhiyong Huang, Hao Chen, Likai Mao, Feng Zhang, Lu Zhang, et al. Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nature Genetics 2015. https://www.nature.com/articles/ng.3199
Lu Zhang, Jing Zhang, Jing Yang, Dingge Ying, Yu lung Lau, Wanling Yang. PriVar: a flexible toolkit for prioritizing SNV and indel from next generation sequencing data. Bioinformatics 2013. https://academic.oup.com/bioinformatics/article/29/1/124/272319
Lu Zhang+, Wanling Yang+*, Dingge Ying, Stacey S. Cherny, Friedhelm Hildebrandt, Pak Chung Sham, Yu lung Lau. 2011. Homozygosity mapping on a single patient identification of homozygous regions of recent common ancestry by using population data. Human Mutation 2011. https://onlinelibrary.wiley.com/doi/10.1002/humu.21432