I am an Associate Professor in the Computer Science Department at University of Wisconsin, Madison. My research interests are in designing systems and algorithms for large scale data analysis and machine learning. Before coming to Madison, I was a post-doctoral researcher in the Systems Research Group at Microsoft Research in Redmond. Previously, I completed my PhD from UC Berkeley where I was advised by Ion Stoica and Mike Franklin. I also have a Masters from University of Illinois at Urbana-Champaign and worked in the Systems Research Group, with Prof. Roy Campbell.
Group
- Rutwik Jain (co-advised with Matt Sinclair)
- Brandon Tran (co-advised with Matt Sinclair)
- Minghao Yan
- Johannes Freischuetz
- Tzu-Tao Chang
- Fanchao Chen
- Tareq Mahmood
- Seth Ockerman
Alumni
+PhD
- Song Bian → NVIDIA Research Labs
- Konstantinos Kanellis → AWS Learned Systems Group
- Jason Mohoney → Post-doc at MIT
- Saurabh Agarwal → Post-doc at UT-Austin
Post-doctoral Researchers
- Pengfei Zheng (co-advised with Aditya Akella) → Huawei Technologies
MS
- Devesh Sarda → Databricks
- Aditi Singh → Nutanix
- Mohil Patel → Oracle
- Rachit Tibrewal
- Olesia Elfimova → Dropbox
- Adarsh Kumar → Amazon Alexa AI
- Arjun Balasubramanian → Amazon AWS
BS
- Wei Hao → Columbia
- Yiheng Xu → Maryland
- Yuhan Liu → UChicago
- Ziyi Zhang → UChicago
- Rui Pan → Princeton
- Lynn Liu → UC Berkeley
- Prasoon Sinha → UT Austin
- Anze Xie → UCSD
- Anders Carlsson → Amazon
- Keting Chen → Cornell
- Anyong Mao → USC
Current Research Areas
- Improving LLM Inference: Reducing the cost of running large language models through inference-efficient model architectures, memory management, and speculative decoding.
- GPU Variability & Power Management: Understanding how variability across GPUs affects cluster performance through variability-aware scheduling and high-fidelity GPU energy modeling.
- Vector Search: New indexing and query methods to make vector similarity search faster and more scalable, including adaptive indexes and vector database deployment on HPC platforms.
- Integrating ML into Systems: Using machine learning to improve core system components, including memory tiering and tuning unstable and noisy cloud applications.
Teaching
CS 537 Intro to OS: F24 S23 S20 S19
CS 744 Big Data Systems: S25 S24 F22 F21 F20 F19 F18
CS 839: Advanced Machine Learning Systems: S22
Selected Recent Publications
Brandon Tran, Matt Sinclair, Shivaram Venkataraman, Matthias Maiterth, Woong Shin Wattchmen: Watching the Wattchers – High Fidelity, Flexible GPU Energy Modeling - ICS 2026 (to appear)
Konstantinos Kanellis, Sujay Yadalam, Hayden Coffey, Shivaram Venkataraman, Michael Swift From Good to Great: Parameter Tuning in Memory Tiering Systems - IEEE Transactions on Computers 2026
Song Bian, Tao Yu, Shivaram Venkataraman, Youngsuk Park Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs - ICLR 2026
Saurabh Agarwal, Bodun Hu, Anyong Mao, Aditya Akella, Shivaram Venkataraman Improving Memory Management for LLM Inference Workloads - NSDI 2026
Jason Mohoney, Devesh Sarda, Mengze Tang, Shihabur Rahman Chowdhury, Anil Pacaci, Ihab F. Ilyas, Theodoros Rekatsinas, Shivaram Venkataraman Quake: Adaptive Indexing for Vector Search - OSDI 2025
Song Bian, Minghao Yan, Shivaram Venkataraman Scaling Inference-Efficient Language Models - ICML 2025
Tzu-Tao Chang, Shivaram Venkataraman LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models - ICML 2025
Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman Decoding Speculative Decoding - NAACL 2025
Konstantinos Kanellis, Badrish Chandramouli, Ted Hart, Shivaram Venkataraman From FASTER to F2: Evolving Concurrent Key-Value Store Designs for Large Skewed Workloads - VLDB 2025
Tzu-Tao Chang, Shivaram Venkataraman Eva: Cost-Efficient Cloud-Based Cluster Scheduling - Eurosys 2025
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman TUNA: Tuning Unstable and Noisy Cloud Applications - Eurosys 2025
Seth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Rob Ross, Shivaram Venkataraman PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training - Supercomputing 2025
Please see Google Scholar for a complete list.