Case Study

A San Francisco-based AI infrastructure company needed robust MLOps on AWS to unblock scaling of their synthetic data platform.

“Within a month of our migration from Kubeflow to Apolo on AWS, we tripled the number of ML experiments we could run.”

- Yashar Behzadi, CEO

The Opportunity.

According to market research firm Omdia¹, the AI computer vision market is expected to reach $33.5 billion by 2025.

To date, computer vision driven by deep learning has been expensive and hard to scale as it relies heavily on supervised learning that requires human-in-the-loop labeling of key image attributes. Besides the time and cost required for manual labeling, there are also significant ethical and privacy issues connected to the use of real-world data.

All of these issues are effectively solved by Synthesis AI’s synthetic data technology. By combining CGI technologies with novel generative AI models, their simple API enables the programmatic generation of millions of images with pixel-perfect labels. Further, Synthesis AI’s synthetic data can often provide an even higher quality result than real images.

The Challenge.

With individual client demands exceeding hundreds of millions of synthetic data images per month, Synthesis AI needed to build a robust and scalable infrastructure from day one.

Like many AI-focused companies, Synthesis AI initially assigned their internal ML engineering team the task of building and maintaining their MLOps infrastructure (including coordination and management of on-prem and cloud compute resources, data, models, pipelines and workflows). For this purpose, the team chose Kubeflow, a popular open-source ML development platform, running on AWS as the foundation upon which they would build their ML development lifecycle.

After 6 months of building and re-building on Kubeflow, the team realized that they were spending as much time on MLOps as they were on ML. Their synthetic data and AI pipelines required constant maintenance, they needed to manually integrate and update every tool they sought to use, and managing their computation resources on AWS and on-prem required constant attention and maintenance. They realized that Kubeflow itself is not a scalable MLOps solution.

Furthermore, Synthesis AI was limited by their cloud provider, facing allocation and infrastructure maintenance issues that severely complicated their model training process.

Synthesis AI needed solutions for both MLOps and cloud computing that allowed their ML Engineers to focus on the models.

The Solution.

First, Apolo migrated the company’s entire ML workflow from Kubeflow to Apolo Platform, all completely within their secure AWS environment. Second, Apolo engaged AWS to provide a long-term solution to Synthesis AI’s computational resource requirements, ensuring both scalability and availability. These migrations included:

  • Deployment of Apolo cluster in the team’s existing infrastructure in the legacy environment

  • Migration of the ML team from Kubeflow to Apolo

  • Set up of infrastructure on AWS, including allocation of required computational quotas

  • Deployment of the Apolo cluster on new AWS infrastructure

  • Migration of data and computations from legacy cloud to AWS

As a result, Apolo unblocked Synthesis AI to scale its synthetic data platform. Synthesis AI’s ML productivity tripled in the first month alone, increasing the number of training jobs by 10x, while saving over $100,000 in computing costs.