Source: Arm Author: Arm
Imagine you're editing a video on your smartphone and need to add the right sound effects. Or maybe you want to generate custom sounds for setting ringtones, alarms, or social media posts. Instead of searching online or buying audio clips, you type in a description, such as "soft ocean waves at sunset," and seconds later your device will generate the right sound for you, without even having to connect to the Internet. Thanks to a new collaboration between Arm and Stability AI, this completely end-to-end direct audio generation technology has become a reality.
Arm and Stability AI work together to speed up text-to-audio response
Stability AI is a company focused on the development of artificial intelligence (AI) models in the image, video, 3D and audio fields. Arm KleidiAI provides optimized performance critical routines (i.e., microcores) specifically for Arm cpus. Through KleidiAI's integration with the XNNPack library and ExecuTorch framework, as well as optimization of Stability AI itself, Brings significant AI performance improvements to Stability AI's text-to-audio Open model, "Stable Audio Open".
Stunning results include a dramatic reduction in text-to-audio AI generation time from minutes to seconds and a 30-fold increase in response speed. The Stable Audio Open model runs entirely on Arm CPU-based smartphones and requires no networking, making it a first for text-to-audio AI.
Stability AI takes advantage of KleidiAI automatic acceleration to speed up the response of the model, thereby improving the end-to-end AI performance without compromising quality. KleidiAI delivers performance improvements that save time and money by eliminating the need for additional development efforts by users of the Stable Audio Open model. Arm and Stability AI will continue to work together to achieve more performance jumps and deliver an even better AI user experience.
The significant performance gains show that targeted hardware and software integration makes previously unattainable AI applications feasible on mobile, driving future innovation opportunities. Arm technology powers 99% of the world's smartphones, which means billions of smartphone users now have access to advanced AI audio capabilities.
Working together to address complex AI challenges
The Stable Audio Open model has excellent efficiency, but it is still not easy to run the model directly from the end side on the CPU of a smartphone. In the initial attempts, a single audio sample took more than four minutes to generate, which was not acceptable to the end user.
By working with Arm, Stability AI distills the number of training parameters for the model to a scale suitable for mobile. Then, through the new distillation model and taking advantage of KleidiAI performance acceleration from XNNPack's integration with ExecuTorch, it was possible to generate audio clips in seconds on a mobile Arm CPU.
Prem Akkaraju, CEO of Stability AI, said: "As more professional creative workers and businesses adopt generative AI to help enhance their production processes, it is critical that our models and workflows are available everywhere for builders and creators to use. We are delighted to be working with Arm on this. "The Arm platform is used across the entire ecosystem, from servers to smartphones, and Arm is committed to accelerating AI models in all major frameworks by integrating Arm Kleidi into the software stack, so Arm was the perfect fit for us."
The rise of text-to-audio AI
Stability AI has been at the forefront of generative AI development since 2022, making waves with its industry-leading image model, Stable Diffusion. Building on the success of Stable Diffusion, the company then launched Stable Audio, one of the first fully licensed audio models designed to generate high-quality music and sound effects from text prompt words. These AI models rank highly on major platforms such as Hugging Face and have millions of users, forming an active tech community.
An advanced audio AI experience for everyone
This achievement is just the beginning of the collaboration, Arm and Stability AI have planned more performance optimization initiatives to bring users a better experience. By working together, Arm is laying the foundation for end-to-end AI in audio, graphics, video and 3D, reshaping the way everyone creates content and interacts with digital media. By distilling advanced models and using optimized software to deploy them on common hardware devices, we are paving the way for a future where everyone can enjoy advanced AI applications, models, and experiences directly from the device in their pocket.
免责声明: 本文章转自其它平台,并不代表本站观点及立场。若有侵权或异议,请联系我们删除。谢谢! Disclaimer: This article is reproduced from other platforms and does not represent the views or positions of this website. If there is any infringement or objection, please contact us to delete it. thank you! |