Efficient Systems for Foundation Models

Workshop at the International Conference on Machine Learning (ICML) 2023.

Banner Code it, run it, crash it–restart it.

🔥 the gist

  • what? A workshop to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference.
  • when & where?
    • Submit your papers by 2023/05/31 (see our 📝 call for papers for details or go to 🤝 OpenReview);
    • Attend the workshop, 29th of July 2023, in Honolulu, Hawaii—or join us virtually.
  • questions? Contact us at esfomo.workshop@gmail.com.

Banner Best paper and poster prizes sponsored by Together–contact us if you are interested in sponsoring as well!

🦾 the pitch

As models increase in size and training budget, they not only systematically improve in upstream quality, but also exhibit novel emergent capabilities. This increase in scale raises proportionate difficulties for practitioners: foundation model training and inference lie at a unique interdisciplinary crossroad, combining open problems in algorithms, system design, and software engineering.

Machine learning practitioners are key stakeholders here: on the one hand, researchers may contribute algorithmic insights and novel methods to improving training and inference of large models; on the other hand, novel research findings may be best demonstrated at scale—which may require training models as efficiently as possible to make the best use of available resources.

The goal of this workshop is to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference. We welcome submissions around training and inference systems/algorithms for foundation models, focusing on scaling-up or on reducing compute, time, memory, bandwidth, and energy requirements. Notably, we encourage submissions concerning the entire spectrum of foundation models: from BERT-sized Transformers, to large models with 100B+ parameters. Topics include but are not limited to (see our 📝 call for papers for details):

  • Training and inference systems, either distributed at large scale or in resource-constrained scenarios;
  • Algorithms for improved training and inference efficiency;
  • Systems for foundation models, such as novel programming languages or compilers.

📆 the plan

(timings are provided tentatively while we wait for the logistics to be confirmed, all times HST, UTC-10)

  Topic Speaker    
8:45am Opening remarks      
9:00am Session I: Training Strategies and Libraries      
  Scaling Up Models and Data with t5x and seqio Adam Roberts
(Google Research)
  Using Megatron to Train Large Language Models Deepak Narayanan
(Microsoft Research)
  Training Large Language Models on Cerebras Wafer-Scale Clusters Natalia Vassilieva
10:15am Contributed Talk 1      
10:30am Coffee break      
10:35am Contributed Talk 2      
11:00am Session II: Efficient Inference      
  The Case for 4-bit Inference Tim Dettmers
(University of Washington)
  Efficienly Scaling Transformer Inference Aakanksha Chowdhery
(Google Research)
noon Lunch break      
1pm Session III: Deep Optimization      
  PyTorch 2.x: Faster, More Pythonic, and as Dynamic as Ever Natalia Gimelshein
  High-Performance Kernel Programming with Triton Philippe Tillet
2pm Contributed Talk 3      
2:15pm Poster session      
3:15pm Coffee break      
3:30pm Panel: Large Language Models Tooling Across Industry and Academia Anna Goldie (Anthropic),
Rishi Bommasani (Stanford University),
Susan Zhang (Meta),
Emily Webber (AWS),
James Bradbury (Google)
4:30pm Session IV: Collaborative Approaches      
  Distributed Systems for Decentralized AI Ce Zhang
(ETH, Together)
  Open Tooling for Large Language Models Thomas Wang
5:30pm Awards      

🧑‍🏫 the speakers

💬 the panelists

😎 the organizers

😍 the sponsors