Institute for Advanced Simulation (IAS)

Datenanalytik und Maschinenlernen (IAS-8)

JULAIN Talk by Thijs Vogels

Anfang

09.06.2022 14:00 Uhr

Ende

09.06.2022 15:00 Uhr

Veranstaltungsort

INM Seminar room building 15.9U, room 4001b

Communication-efficient distributed learning and PowerSGD

Thijs Vogels
Machine Learning & Optimization Laboratory, École polytechnique fédérale de Lausanne (EPFL)

When: 9 June 2022, 4pm
Where: INM Seminar room building 15.9U, room 4001b

Invitation and moderation: Hanno Scharr, IAS-8

Abstract

In data-parallel optimization of machine learning models, workers collaborate to speed up the training. By averaging their model updates with each other, the updates become more informative, resulting in faster convergence. For today’s deep learning models, model updates can be gigabytes large, and averaging them between all workers can be a bottleneck in the scalability of distributed learning. In this talk, we explore two approaches to alleviating communication bottlenecks: lossy communication compression and sparse (decentralized) communication. We focus on the PowerSGD communication compression algorithm which approximates gradient updates as low-rank matrices. PowerSGD can yield communication savings of > 100x and was used successfully to speed up the training of OpenAI’s DALL-E, RoBERTa, and Meta’s XLM-R.

Short CV

Thijs is a PhD student at EPFL’s Machine Learning & Optimization Laboratory under Martin Jaggi. He works on developing and understanding practical optimization algorithms for large-scale distributed training of deep learning models.