JULAIN Talk by Thijs Vogels

Anfang
09.06.2022 14:00 Uhr
Ende
09.06.2022 15:00 Uhr
Veranstaltungsort
INM Seminar room building 15.9U, room 4001b

Communication-efficient distributed learning and PowerSGD

Thijs Vogels
Machine Learning & Optimization Laboratory, École polytechnique fédérale de Lausanne (EPFL)

  • When: 9 June 2022, 4pm
  • Where: INM Seminar room building 15.9U, room 4001b

Invitation and moderation: Hanno Scharr, IAS-8


Abstract

In data-parallel optimization of machine learning models, workers collaborate to speed up the training. By averaging their model updates with each other, the updates become more informative, resulting in faster convergence. For today’s deep learning models, model updates can be gigabytes large, and averaging them between all workers can be a bottleneck in the scalability of distributed learning. In this talk, we explore two approaches to alleviating communication bottlenecks: lossy communication compression and sparse (decentralized) communication. We focus on the PowerSGD communication compression algorithm which approximates gradient updates as low-rank matrices. PowerSGD can yield communication savings of > 100x and was used successfully to speed up the training of OpenAI’s DALL-E, RoBERTa, and Meta’s XLM-R.


Short CV

Thijs is a PhD student at EPFL’s Machine Learning & Optimization Laboratory under Martin Jaggi. He works on developing and understanding practical optimization algorithms for large-scale distributed training of deep learning models.


Readings:

  • RelaySum for Decentralized Deep Learning on Heterogeneous Data, NeurIPS 2021
  • Practical Low-Rank Communication Compression in Decentralized Deep Learning, NeurIPS 2020
  • PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization, NeurIPS 2019
  • Letzte Änderung: 18.01.2023