Meta researchers have introduced Perception Encoder Audiovisual, PEAV, as a new family of encoders for joint audio and video understanding. […]
Category: Audio Language Model
Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation
Meta has released SAM Audio, a prompt driven audio separation model that targets a common editing bottleneck, isolating one sound […]
StepFun AI Releases Step-Audio-R1: A New Audio LLM that Finally Benefits from Test Time Compute Scaling
Why do current audio AI models often perform worse when they generate longer reasoning instead of grounding their decisions in […]
StepFun AI Releases Step-Audio-EditX: A New Open-Source 3B LLM-Grade Audio Editing Model Excelling at Expressive and Iterative Audio Editing
How can speech editing become as direct and controllable as simply rewriting a line of text? StepFun AI has open […]
LongCat-Flash-Omni: A SOTA Open-Source Omni-Modal Model with 560B Parameters with 27B activated, Excelling at Real-Time Audio-Visual Interaction
How do you design a single model that can listen, see, read and respond in real time across text, image, […]
UT Austin and ServiceNow Research Team Releases AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
Voice AI is becoming one of the most important frontiers in multimodal AI. From intelligent assistants to interactive agents, the […]
Deepdub Introduces Lightning 2.5: A Real-Time AI Voice Model With 2.8x Throughput Gains for Scalable AI Agents and Enterprise AI
Deepdub, an Israeli Voice AI startup, has introduced Lightning 2.5, a real-time foundational voice model designed to power scalable, production-grade […]
TwinMind Introduces Ear-3 Model: A New Voice AI Model that Sets New Industry Records in Accuracy, Speaker Labeling, Languages and Price
TwinMind, a California-based Voice AI startup, unveiled Ear-3 speech-recognition model, claiming state-of-the-art performance on several key metrics and expanded multilingual […]
Alibaba Qwen Team Releases Qwen3-ASR: A New Speech Recognition Model Built Upon Qwen3-Omni Achieving Robust Speech Recogition Performance
Alibaba Cloud’s Qwen team unveiled Qwen3-ASR Flash, an all-in-one automatic speech recognition (ASR) model (available as API service) built upon […]
Microsoft AI Lab Unveils MAI-Voice-1 and MAI-1-Preview: New In-House Models for Voice AI
Microsoft AI lab officially launched MAI-Voice-1 and MAI-1-preview, marking a new phase for the company’s artificial intelligence research and development […]
