Efficient Systems for Foundation Models
Workshop at the International Conference on Machine Learning (ICML) 2024.
Code it, run it, crash it–restart it.
➡️ ES-FoMO is back for ICML 2024! Find us in room Lehar 2, and check-out the schedule below, and the accepted papers on OpenReview.
🔥 the gist
- what? A workshop to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference.
- when & where?
- Join us at ICML 2024, either virtually or physically in Vienna.
- questions? Contact us at
esfomo.workshop@gmail.com
. - looking for the 2023 edition?
- Check-out the recorded talks and panels, on the official ICML platform;
- Check-out the accepted papers on OpenReview.
📆 the plan
All times CET, UTC+2. Full schedule to be confirmed.
Topic | Speaker | |
---|---|---|
🎛️ Session I: Quantization, Pruning, and Sparsity | ||
9:00am | Talk: Efficient Quantization Methods and Marlin, a Fast 4-Bit Inference Kernel | Elias Frantar (IST Austria) |
9:30am | Oral: Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation | Harry Dong (CMU) |
9:45am | Oral: Lottery Ticket Adaptation: Mitigating Destructive Inference in LLMs | Ashwinee Panda (Princeton) |
10:00am | Coffee break | |
🦾 Session II: Emerging Architectures | ||
10:15am | Oral: Simple Linear Attention Language Models Balance the Recall-Throughput Tradeoff | Simran Arora (Stanford) |
10:30am | Oral: xLSTM: Extended Long Short-Term Memory | Maximilian Beck (JKU) |
10:45am | Talk: On the Tradeoffs of State-Space Models | Albert Gu (CMU) |
11:15am | Talk: Scaling Mixture-of-Experts: Lessons from DBRX | Vitaliy Chiley (Databricks) |
11:45am | Oral: Characterising Prompt Compression Methods for Long Context Inference | Siddharth Jha (UCB) |
noon | Lunch break | |
1:00pm | 🧑🎓 Poster Session | |
2:15pm | 🏅 Best Paper and Best Poster Awards | |
🔥 Session III: Hardware | ||
2:30pm | Talk: Scaling Intelligence | Azalia Mirhoseini (Stanford / Google DeepMind) |
3:00pm | Off-the-record: Frontier Clusters for Frontier Models: Scaling to 100,000 GPUs and Beyond | Dylan Patel (SemiAnalysis) |
3:30pm | Coffee break | |
3:45pm | 💬 Panel: Data and Architecture Trends Across Industry and Open Communities | |
Deepak Narayanan (NVIDIA), Dylan Patel (SemiAnalysis), Dirk Groeneveld (AI2), Hailey Schoelkopf (EleutherAI) | ||
4:30pm | 💾 Session IV: Data | |
Open Tooling for Large Data Pipelines | Vaishaal Shankar (Apple) |
|
6:00pm | 🎉 Post-workshop happy hour |
🦾 the pitch
As models increase in size and training budget, they not only systematically improve in upstream quality, but also exhibit novel emergent capabilities. This increase in scale raises proportionate difficulties for practitioners: foundation model training and inference lie at a unique interdisciplinary crossroad, combining open problems in algorithms, system design, and software engineering.
Machine learning practitioners are key stakeholders here: on the one hand, researchers may contribute algorithmic insights and novel methods to improving training and inference of large models; on the other hand, novel research findings may be best demonstrated at scale—which may require training models as efficiently as possible to make the best use of available resources.
The goal of this workshop is to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference. We welcome submissions around training and inference systems/algorithms for foundation models, focusing on scaling-up or on reducing compute, time, memory, bandwidth, and energy requirements. Notably, we encourage submissions concerning the entire spectrum of foundation models: from BERT-sized Transformers, to large models with 100B+ parameters. Topics include but are not limited to (see our 📝 call for papers for details):
- Training and inference systems, either distributed at large scale or in resource-constrained scenarios;
- Algorithms for improved training and inference efficiency;
- Systems for foundation models, such as novel programming languages or compilers.
This is the second installment of ES-FoMo; we are bringing further focus in our sessions and talks on three trends observed in 2023:
- The emergence of novel architectures, popularized by Mamba (state-space models) and Mixtral (mixture-of-experts);
- Efficient open implementations, such as
gpt-fast
and vLLM; - Open questions on novel hardware and data tooling.