Designing modern cloud systems is challenging because developers must simultaneously balance performance, scalability, modularity, reliability, and operational complexity. Cloud applications are typically composed of many independently developed services that interact through intricate communication and resource-sharing patterns. While modularity improves maintainability and developer productivity, it also introduces coordination overheads, unpredictable cross-service interactions, and complex failure modes that are difficult to reason about. Optimizing these systems is equally challenging because workloads, hardware environments, and deployment conditions continuously evolve, causing performance bottlenecks and resource inefficiencies to shift over time. As a result, developers must expend significant manual effort to design efficient architectures, implement optimized components, tune runtime behavior, and ensure that the overall system remains reliable and performant under dynamic operating conditions.

As part of this theme, we build tools, techniques, and abstractions to reduce the developer effort required for designing, implementing, optimizing, and managing modular cloud systems.

Publications

Iridescent: A Framework Enabling Online System Implementation Specialization

Vaastav Anand, Deepak Garg, Antoine Kaufmann
Under submission, 2025.
[Preprint]

Generating representative macrobenchmark microservice systems from distributed traces with Palette

Vaastav Anand, Matheus Stolet, Jonathan Mace, Antoine Kaufmann
In 16th ACM SIGOPS Asia-Pacific Workshop on Systems (ApSys 2025), 2025.
[Paper PDF]

Towards Online Code Specialization of Systems

Vaastav Anand, Deepak Garg, Antoine Kaufmann
In arxiv, 2025.
[Preprint]

Towards Using LLMs for Distributed Trace Comparison (Abstract)

Vaastav Anand, Pedro Las-Casas, Rodrigo Fonseca, Antoine Kaufmann
In 6th International Workshop on Cloud Intelligence / AIOps (AIOps '25), 2025.
[Paper PDF]

Online Specialization of Systems with Iridescent

Vaastav Anand
In ACM Student Research Competition @ SOSP 2024, 2024.
First Place in Graduate Category
[Poster]

Blueprint: A Toolchain for Highly-Reconfigurable Microservice Applications

Vaastav Anand, Deepak Garg, Antoine Kaufmann, Jonathan Mace
In 29th ACM Symposium on Operating Systems Principles (SOSP), Koblenz, Germany, 2023.
[Paper PDF] [Video] [Slides] [Artifact]

The Odd One Out: Energy is not like Other Metrics

Vaastav Anand, Zhiqiang Xie, Matheus Stolet, Roberta De Viti, Thomas Davidson, Reyahneh Karimipour, Safya Alzayat, Jonathan Mace
In 1st Workshop on Sustainable Computer Systems Design and Implementation (HotCarbon), La Jolla, USA, 2022.
[Paper PDF] [Video]

Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

Arpan Gujarati, Reza Karimi, Safya Alzayat, Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace
In 14th Symposium on Operating Systems Design and Implementation (OSDI), Banff, Canada, 2020.
Distinguished Artifact Award
[Paper PDF] [Video] [Slides] [Artifact]

No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy

Amit Samanta, Suhas Shrinivasan, Antoine Kaufmann, Jonathan Mace
In arXiv, 2019.
[Preprint]

Floem: Language, Compiler, and Runtime for Network Applications on Heterogeneous Systems

Phitchaya Mangpo Phothilimthana, Ming Liu, Antoine Kaufmann, Simon Peter, Rastislav Bodik, Thomas Anderson
In 13th Symposium on Operating Systems Design and Implementation (OSDI), Carlsbad, CA, USA, 2018.
[Paper PDF] [Audio] [Slides] [Artifact]