Large-Scale Deep Learning in Heterogeneous Distributed Systems (Xiaowen Chu et al.)

A key driving force of the success of deep learning is the growing computing power of multi-core and many-core processors such as GPUs, FPGA, and ASIC. With the increase of training data size and complexity of deep neural networks, how to efficiently utilize the limited, expensive, and shared computing and communicating resources in a heterogeneous distributed system to support large-scale deep learning tasks from different users becomes an important issue for cloud service providers. Our ultimate goal is to make the deep learning tasks as fast as possible by (1) exploiting the hardware potential to the limit; (2) optimizing the related software components; (3) designing smart resource allocation and task schedule for different simultaneous deep learning tasks.

Objectives:

Our Impact:

Selected Publications:

  1. Shi, X.-W. Chu, and B. Li, “MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms,” IEEE INFOCOM 2019, Paris, France, May 2019.
  2. Jia, S. Song, S. Shi, W. He, Y. Wang, H. Rong, F. Zhou, L. Xie, Z. Guo, Y. Yang, L. Yu, T. Chen, G. Hu, and X.-W. Chu, “Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes,” Workshop on Systems for ML and Open Source Software, collocated with NeurIPS 2018, Montreal, Canada, Dec 2018.
  3. Shi, Q. Wang, and X.-W. Chu, “Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs,” IEEE DataCom 2018, Athens, Greece, August 2018. (Best Paper Award)
  4. Shi, Q. Wang, P. Xu, and X.-W. Chu, “Benchmarking State-of-the-Art Deep Learning Software Tools,” arXiv:1608.07249, https://arxiv.org/abs/1608.07249 .
  5. Wang and X.-W. Chu, “GPGPU Power Estimation with Core and Memory Frequency Scaling,” GreenMetrics 2017, in conjunction with ACM Sigmetrics 2017, Champaign-Urbana, USA June 2017. (Also appeared in ACM Performance Evaluation Review.)
  6. Chau, X.-W. Chu, H. Liu, and Y.-W. Leung, “Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems,” ACM e-Energy 2017, Hong Kong, May 2017.
  7. Mei, X.-W. Chu, Y.-W. Leung, H. Liu, and Z. Li, “Energy Efficient Real-time Task Scheduling on CPU-GPU Hybrid Clusters,” IEEE INFOCOM 2017, Atlanta, GA, USA, 1-4 May, 2017.
  8. Mei, Q. Wang, and X.-W. Chu, “A Survey and Measurement Study of GPU DVFS on Energy Conservation,” Digital Communications and Networks, Vol. 3, No. 2, Pages 89-100, May 2017.
  9. Mei and X.-W. Chu, “Dissecting GPU Memory Hierarchy through Microbenchmarking,” IEEE Transactions on Parallel and Distributed Systems, Vol. 28. No. 1, pages 72-86, Jan 2017. (An earlier short version has been presented at IFIP NPC 2014.)
  10. Mei, L. Yung, K. Zhao, and X.-W. Chu, “A Measurement Study of GPU DVFS on Energy Conservation,” USENIX HotPower’13, co-located with the 24th ACM Symposium on Operating Systems Principles (SOSP), Pennsylvania, USA, November 2013.

For further information on this research topic, please contact Prof. Xiaowen Chu.