Back to Blog
Tag: AI Audio
6 posts found
Comprehensive 2025 guide to Fun-CosyVoice 3.0: multilingual zero-shot TTS, benchmarks, installation, deployment options, and production best practices.
Discover Gemini 2.5 Flash Native Audio, Google's advanced voice AI with 30 HD voices, 71.5% function calling accuracy, live speech translation for 70+ languages, and 90% instruction adherence. Complete guide with technical specs, real-world applications, and deployment tips.
Discover GLM-ASR-Nano-2512, a 1.5B parameter ASR model that outperforms Whisper V3 with exceptional Cantonese support and low-volume speech recognition. Complete guide with benchmarks, deployment tips, and practical applications.
Discover GLM-TTS's revolutionary zero-shot voice cloning technology with reinforcement learning. Complete guide with benchmarks, deployment tips, and practical applications.
IndexTTS2 is a next-generation text-to-speech model developed by Bilibili, officially open-sourced on September 8, 2025. The model achieves major breakthroughs in emotional expression and duration control, being hailed by the community as 'the most realistic and expressive TTS model.'
Qwen3-ASR-Flash is a next-generation speech recognition service developed by Alibaba's Tongyi Qianwen team based on the Qwen3-Omni multimodal foundation model.