Deepdub, an Israeli Voice AI startup, has introduced Lightning 2.5, a real-time foundational voice model designed to power scalable, production-grade […]
Category: Audio Language Model
TwinMind Introduces Ear-3 Model: A New Voice AI Model that Sets New Industry Records in Accuracy, Speaker Labeling, Languages and Price
TwinMind, a California-based Voice AI startup, unveiled Ear-3 speech-recognition model, claiming state-of-the-art performance on several key metrics and expanded multilingual […]
Alibaba Qwen Team Releases Qwen3-ASR: A New Speech Recognition Model Built Upon Qwen3-Omni Achieving Robust Speech Recogition Performance
Alibaba Cloud’s Qwen team unveiled Qwen3-ASR Flash, an all-in-one automatic speech recognition (ASR) model (available as API service) built upon […]
Microsoft AI Lab Unveils MAI-Voice-1 and MAI-1-Preview: New In-House Models for Voice AI
Microsoft AI lab officially launched MAI-Voice-1 and MAI-1-preview, marking a new phase for the company’s artificial intelligence research and development […]
OpenAI Releases an Advanced Speech-to-Speech Model and New Realtime API Capabilities including MCP Server Support, Image Input, and SIP Phone Calling Support
OpenAI has officially launched Realtime API and gpt-realtime, its most advanced speech-to-speech model, moving the Realtime API out of beta […]
NVIDIA AI Just Released Streaming Sortformer: A Real-Time Speaker Diarization that Figures Out Who’s Talking in Meetings and Calls Instantly
NVIDIA has released its Streaming Sortformer, a breakthrough in real-time speaker diarization that instantly identifies and labels participants in meetings, […]
NVIDIA AI Just Released the Largest Open-Source Speech AI Dataset and State-of-the-Art Models for European Languages
Nvidia has taken a major leap in the development of multilingual speech AI, unveiling Granary, the largest open-source speech dataset […]
Alibaba Qwen Introduces Qwen3-MT: Next-Gen Multilingual Machine Translation Powered by Reinforcement Learning
Alibaba has introduced Qwen3-MT (qwen-mt-turbo) via Qwen API, its latest and most advanced machine translation model, designed to break language barriers […]
NVIDIA AI Releases Canary-Qwen-2.5B: A State-of-the-Art ASR-LLM Hybrid Model with SoTA Performance on OpenASR Leaderboard
NVIDIA has just released Canary-Qwen-2.5B, a groundbreaking automatic speech recognition (ASR) and language model (LLM) hybrid, which now tops the […]
Mistral AI Releases Voxtral: The World’s Best (and Open) Speech Recognition Models
Mistral AI has released Voxtral, a family of open-weight models—Voxtral-Small-24B and Voxtral-Mini-3B—designed to handle both audio and text inputs. Built […]