HONG KONG BAPTIST UNIVERSITY
FACULTY OF SCIENCE
Department of Computer Science Colloquium
Interpretable and Federated Tensor Factorization for Computational Phenotyping
Dr. Yejin Kim
School of Biomedical Informatics
University of Texas Health Science Center (UTHealth)
Date: June 20, 2018 (Wednesday)
Time: 4:30 - 5:30 pm
Venue: SCT909, Cha Chi Ming Science Tower, Ho Sin Hang Campus
Phenotyping based on machine learning has been proposed to facilitate extraction of meaningful phenotypes automatically from electronic health records (EHRs) without human supervision through a process called computational phenotyping. Phenotyping by nonnegative tensor factorization (NTF) is becoming particularly popular due to its ability to capture high dimensional EHRs data.
One important characteristic for phenotypes is to be distinct from each other, because otherwise clinicians cannot interpret and use the phenotypes easily. Yet another critical concern is that increasing interpretability comes at an expense of losing discriminative to certain outcome. Thus, we developed a supervised NTF that derives discriminative and distinct phenotypes. We represented co-occurrence of diagnoses and prescriptions in EHRs as a third-order tensor, and decomposed it using the CP algorithm. We evaluated discriminative power of our models with an Intensive Care Unit database (MIMIC-III) and demonstrated superior performance than state-of-the-art ICU mortality calculators (e.g., APACHE II, SAPS II). Example of the resulted phenotypes are sepsis with acute kidney injury, cardiac surgery, anemia, respiratory failure, heart failure, cardiac arrest, metastatic cancer (requiring ICU), end-stage dementia (requiring ICU and transitioned to comfort-care), intraabdominal conditions, and alcohol abuse/withdrawal.
Meanwhile, these models need a large amount of diverse EHRs to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple institutions, in which direct patient-level data sharing is not possible (e.g., due to institutional policies). Thus, we developed a novel solution to enable federated tensor factorization for computational phenotyping without sharing patient-level data. We developed secure data harmonization and federated computation procedures based on alternating direction method of multipliers (ADMM). Using this method, the multiple institutions iteratively update tensors and transfer secure summarized information to a central server, and the server aggregates the information to generate phenotypes. We demonstrated with real medical datasets that our method resembles the centralized training model (based on combined datasets) in terms of accuracy and phenotypes discovery while respecting privacy.
Yejin Kim is an assistant professor in the School of biomedical informatics at University of Texas Health Science Center (UTHealth) at Houston. Her research interest is on data mining and machine learning for healthcare problems (particularly computational phenotyping with electronic health records), privacy-preserving data mining, and some recommender systems.
She received her Ph.D. in computer science at Pohang University of Science and Technology (Big data and Artificial Intelligence lab) and B.S. in industrial engineering at the same university. She was a visiting scholar at University of California San Diego advised by Dr. Xiaoqian Jiang.
********* ALL INTERESTED ARE WELCOME ***********
(For enquiry, please contact Computer Science Department at 3411 2385)
Department of Computer Science, Hong Kong Baptist University