• Pruna AI releases an open-source framework for AI model optimization combining multiple compression techniques.
  • Framework supports all model types but currently focuses on image and video generation applications.
  • The enterprise version offers advanced features including automated optimization with an hourly pricing model.

A European AI startup is making its powerful model optimization framework available as open source, potentially enabling significant cost reductions for businesses deploying AI solutions.

Pruna AI announced Thursday that it is releasing its comprehensive compression framework that applies multiple efficiency methods to AI models.

The framework standardizes a variety of optimization techniques—including caching, pruning, quantization, and distillation—allowing developers to compress models while carefully balancing performance gains against potential quality loss.

"We standardize saving and loading the compressed models, applying combinations of these compression methods, and also evaluating your compressed model after you compress it," Pruna AI co-founder and CTO John Rachwan says

Large AI labs like OpenAI have been using various compression methods internally to create faster versions of flagship models—such as GPT-4 Turbo and Black Forest Labs' Flux.1-schnell—Pruna AI aims to democratize these capabilities for the broader development community.

"For big companies, what they usually do is that they build this stuff in-house. And what you can find in the open source world is usually based on single methods," Rachwan explained.
"But you cannot find a tool that aggregates all of them, makes them all easy to use and combine together. And this is the big value that Pruna is bringing right now."

The framework supports all types of AI models, though the company is currently focusing on image and video generation applications. Early adopters include AI companies Scenario and PhotoRoom.

Beyond the open-source edition, Pruna AI offers an enterprise version with advanced features, including an upcoming "compression agent" that automatically determines optimal compression settings based on user requirements.

"You give it your model, you say: 'I want more speed but don't drop my accuracy by more than 2%.' And then, the agent will just do its magic," Rachwan said.

The enterprise offering uses an hourly pricing model similar to cloud GPU services. The company positions its solution as an investment that quickly pays for itself through reduced inference costs.

In one example, Pruna AI made a Llama model eight times smaller without significant quality degradation.

The release comes just months after Pruna AI raised a $6.5 million seed funding round from investors including EQT Ventures, Daphni, Motier Ventures, and Kima Ventures.


Edited By Annette George