Speech Recognition – The TechBriefs

NVIDIA AI Just Released Streaming Sortformer: A Real-Time Speaker Diarization that Figures Out Who’s Talking in Meetings and Calls Instantly

NVIDIA has released its Streaming Sortformer, a breakthrough in real-time speaker diarization that instantly identifies and labels participants in meetings, […]

NVIDIA AI Just Released the Largest Open-Source Speech AI Dataset and State-of-the-Art Models for European Languages

Nvidia has taken a major leap in the development of multilingual speech AI, unveiling Granary, the largest open-source speech dataset […]

NVIDIA AI Releases Canary-Qwen-2.5B: A State-of-the-Art ASR-LLM Hybrid Model with SoTA Performance on OpenASR Leaderboard

NVIDIA has just released Canary-Qwen-2.5B, a groundbreaking automatic speech recognition (ASR) and language model (LLM) hybrid, which now tops the […]

Omni-R1: Advancing Audio Question Answering with Text-Driven Reinforcement Learning and Auto-Generated Data

Recent developments have shown that RL can significantly enhance the reasoning abilities of LLMs. Building on this progress, the study […]

Rime Introduces Arcana and Rimecaster (Open Source): Practical Voice AI Tools Built on Real-World Speech

The field of Voice AI is evolving toward more representative and adaptable systems. While many existing models have been trained […]

NVIDIA Open Sources Parakeet TDT 0.6B: Achieving a New Standard for Automatic Speech Recognition ASR and Transcribes an Hour of Audio in One Second

NVIDIA has unveiled Parakeet TDT 0.6B, a state-of-the-art automatic speech recognition (ASR) model that is now fully open-sourced on Hugging […]

Kyutai Releases MoshiVis: The First Open-Source Real-Time Speech Model that can Talk About Images

Artificial intelligence has made significant strides in recent years, yet integrating real-time speech interaction with visual content remains a complex […]

Hume Introduces Octave TTS: A New Text-to-Speech Model that Creates Custom AI Voices with Tailored Emotions

In the rapidly evolving field of digital communication, traditional text-to-speech (TTS) systems have often struggled to capture the full range […]

Kyutai Releases Hibiki: A 2.7B Real-Time Speech-to-Speech and Speech-to-Text Translation with Near-Human Quality and Voice Transfer

Real-time speech translation presents a complex challenge, requiring seamless integration of speech recognition, machine translation, and text-to-speech synthesis. Traditional cascaded […]