Rethinking Audio-Based Human-Computer Interaction Machines that can respond to human speech with equally expressive and natural audio have become a […]
Category: Audio Language Model
NVIDIA Open Sources Parakeet TDT 0.6B: Achieving a New Standard for Automatic Speech Recognition ASR and Transcribes an Hour of Audio in One Second
NVIDIA has unveiled Parakeet TDT 0.6B, a state-of-the-art automatic speech recognition (ASR) model that is now fully open-sourced on Hugging […]
Boson AI Introduces Higgs Audio Understanding and Higgs Audio Generation: An Advanced AI Solution with Real-Time Audio Reasoning and Expressive Speech Synthesis for Enterprise Applications
In today’s enterprise landscape—especially in insurance and customer support —voice and audio data are more than just recordings; they’re valuable […]
Meta AI Just Released Llama 4 Scout and Llama 4 Maverick: The First Set of Llama 4 Models
Today, Meta AI announced the release of its latest generation multimodal models, Llama 4, featuring two variants: Llama 4 Scout […]
Zyphra Introduces the Beta Release of Zonos: A Highly Expressive TTS Model with High Fidelity Voice Cloning
Text-to-speech (TTS) technology has made significant strides in recent years, but challenges remain in creating natural, expressive, and high-fidelity speech […]
LLaSA-3B: A Llama 3.2B Fine-Tuned Text-to-Speech Model with Ultra-Realistic Audio, Emotional Expressiveness, and Multilingual Support
Text-to-speech (TTS) technology has emerged as a critical tool for bridging the gap between human and machine interaction. The demand […]