Back to Blog
Tag: AI Audio
7 posts found
Complete guide to Qwen3-TTS: Open-source multilingual TTS with 3-second voice cloning, 97ms ultra-low latency, 10 languages support. Covers 1.7B and 0.6B models, voice design, benchmarks vs ElevenLabs/MiniMax, installation, and real-world applications. Apache 2.0 license.
Comprehensive 2025 guide to Fun-CosyVoice 3.0: multilingual zero-shot TTS, benchmarks, installation, deployment options, and production best practices.
Discover Gemini 2.5 Flash Native Audio, Google's advanced voice AI with 30 HD voices, 71.5% function calling accuracy, live speech translation for 70+ languages, and 90% instruction adherence. Complete guide with technical specs, real-world applications, and deployment tips.
Discover GLM-ASR-Nano-2512, a 1.5B parameter ASR model that outperforms Whisper V3 with exceptional Cantonese support and low-volume speech recognition. Complete guide with benchmarks, deployment tips, and practical applications.
Discover GLM-TTS's revolutionary zero-shot voice cloning technology with reinforcement learning. Complete guide with benchmarks, deployment tips, and practical applications.
IndexTTS2 is a next-generation text-to-speech model developed by Bilibili, officially open-sourced on September 8, 2025. The model achieves major breakthroughs in emotional expression and duration control, being hailed by the community as 'the most realistic and expressive TTS model.'
Qwen3-ASR-Flash is a next-generation speech recognition service developed by Alibaba's Tongyi Qianwen team based on the Qwen3-Omni multimodal foundation model.