OctoML, which launched in 2019 with a focus on optimizing machine learning (ML) models, is shifting gears. The company is now unveiling its latest platform, OctoAI, a self-optimizing compute service for AI. This service allows businesses to build ML-based applications and put them into production without worrying about managing the underlying infrastructure.
The original OctoML platform centered on ML engineers, offering optimized and packaged models deployable across different hardware types. OctoAI represents a natural evolution of this platform, providing a fully managed compute service that simplifies the complexities of deploying ML models.
With OctoAI, users can specify their priorities, such as latency or cost, and the platform will automatically select the appropriate hardware. Additionally, the service will optimize the models to improve performance and cost-effectiveness. This can involve deciding whether to run the models on Nvidia GPUs or AWS's Inferentia machines. While users can set their own parameters and choose their hardware, the expectation is that most will prefer OctoAI to manage this for them.
The platform also offers accelerated versions of popular foundation models such as Dolly 2, Whisper, FILM, FLAN-UL2, and Stable Diffusion, with more models to come. Notably, OctoML has managed to increase the speed of Stable Diffusion by three times and reduce the cost by 5x compared to the original model.
While OctoML will continue to support existing customers who only want model optimization, the company's future focus will be on the OctoAI compute platform.