Afforable AI Infrastructure
BreezeML offers an automated, affordable, and reliable solution to deep learning jobs by enabling safe and SLA-guaranteed use of preemptible instances in a GPU cloud.

Seamless Drop-in Library

dist = breezeml.init_dist(config=config) engine = breezeml.init_engine(model, dist, train=True) for step, batch in enumerate(dataloader): engine.run_batch(batch)

Our Products

Public Cloud: For companies that use public clouds to run AI jobs, BreezeML provides a user-friendly virtual cloud interface to which users can submit their jobs and their SLAs. Based on the specified SLA, our cloud automatically selects a combination of on-demand and spot GPU instances from a public cloud to run the job with SLA guarantees.
On-Premise Cluster: For enterprises or government organizations that have on-premise clusters, using our preemption-resilient system in their cluster allows them to share GPU resources between high-priority inference jobs and low-priority training jobs — low-priority jobs can continuously run and get preempted when high-priority jobs arrive without losing any data.

Key Value Points

  • 1. Significantly reduced costs or increased workloads (large models/more users) under a fixed budget
  • 2. SLAs fully preserved
  • 3. Customizable to both public and on-premise clouds
Normalized Performance-per-dollar
Red Line: Non-preemptible baseline


BreezeML, recently launched by a team of researchers from UCLA and Princeton, is an ML-infrastructure company that aims to significantly reduce the costs and development/management challenges of ML (training and inference) jobs by virtualizing heterogeneous cloud resources (e.g., different computation/hardware families on AWS) and leveraging their diverse pricing models. At the core of BreezeML’s virtual cloud service are two innovations backed by years of research: (1) a common runtime that simultaneously powers multiple ML platforms/frameworks (e.g., TensorFlow, PyTorch, and Ray) to enable seamless integration of different ML stacks; and (2) a platform-independent scheduling and orchestration system that makes intelligent use of economically favorable hardware (e.g., lambdas, spot instances) without (a) requiring a single line of code modification to existing jobs, and (b) impacting the SLA of a given ML job.

Our Team

Harry Xu


Professor of Computer Science at UCLA

Ravi Netravali

Chief Scientist

Professor of Computer Science at Princeton University

John Thorpe

Director of Product

Graduating Ph.D. Student at UCLA

Pengzhan Zhao

Director of Engineering

Ph.D. Student of Computer Science at UCLA

Contact Us

Tell us if you would use BreezeML and a bit about your use-case:

Copyright © BreezeML. All rights reserved. 2022.