Designing modern cloud systems is challenging because developers must simultaneously balance performance, scalability, modularity, reliability, and operational complexity. Cloud applications are typically composed of many independently developed services that interact through intricate communication and resource-sharing patterns. While modularity improves maintainability and developer productivity, it also introduces coordination overheads, unpredictable cross-service interactions, and complex failure modes that are difficult to reason about. Optimizing these systems is equally challenging because workloads, hardware environments, and deployment conditions continuously evolve, causing performance bottlenecks and resource inefficiencies to shift over time. As a result, developers must expend significant manual effort to design efficient architectures, implement optimized components, tune runtime behavior, and ensure that the overall system remains reliable and performant under dynamic operating conditions.
As part of this theme, we build tools, techniques, and abstractions to reduce the developer effort required for designing, implementing, optimizing, and managing modular cloud systems.
Iridescent: A Framework Enabling Online System Implementation Specialization
Under submission, 2025.
[Preprint]
Generating representative macrobenchmark microservice systems from distributed traces with Palette
In 16th ACM SIGOPS Asia-Pacific Workshop on Systems (ApSys 2025), 2025.
[Paper PDF]
Towards Online Code Specialization of Systems
In arxiv, 2025.
[Preprint]
Towards Using LLMs for Distributed Trace Comparison (Abstract)
In 6th International Workshop on Cloud Intelligence / AIOps (AIOps '25), 2025.
[Paper PDF]
Online Specialization of Systems with Iridescent
In ACM Student Research Competition @ SOSP 2024, 2024.
First Place in Graduate Category
[Poster]
Blueprint: A Toolchain for Highly-Reconfigurable Microservice Applications
In 29th ACM Symposium on Operating Systems Principles (SOSP), Koblenz, Germany, 2023.
[Paper PDF] [Video] [Slides] [Artifact]
The Odd One Out: Energy is not like Other Metrics
In 1st Workshop on Sustainable Computer Systems Design and Implementation (HotCarbon), La Jolla, USA, 2022.
[Paper PDF] [Video]
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up
In 14th Symposium on Operating Systems Design and Implementation (OSDI), Banff, Canada, 2020.
Distinguished Artifact Award
[Paper PDF] [Video] [Slides] [Artifact]
No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy
In arXiv, 2019.
[Preprint]
Floem: Language, Compiler, and Runtime for Network Applications on Heterogeneous Systems
In 13th Symposium on Operating Systems Design and Implementation (OSDI), Carlsbad, CA, USA, 2018.
[Paper PDF] [Audio] [Slides] [Artifact]