Skip to content
Friday, March 20, 2026
The TechBriefs
  • Home
  • Technology
  • AI
  • Computers
  • Security
  • Internet
  • Press Releases
    • GlobeNewswire
    • PRNewswire
  • Contact

Category: Sound

  • Home
  • Sound
Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models
  • AI
  • Artificial Intelligence
  • Editors Pick
  • New Releases
  • Sound
  • Staff
  • Technology
  • Voice AI

Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models

  • 0

Speech technology still has a data distribution problem. Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems have improved rapidly for […]

Fish Audio Releases Fish Audio S2: A New Generation of Expressive Text-to-Speech (TTS) with Absurdly Controllable Emotion
  • agentic AI
  • AI
  • Artificial Intelligence
  • Editors Pick
  • Sound
  • Technology
  • TTS
  • Voice AI

Fish Audio Releases Fish Audio S2: A New Generation of Expressive Text-to-Speech (TTS) with Absurdly Controllable Emotion

  • 0

The landscape of Text-to-Speech (TTS) is moving away from modular pipelines toward integrated Large Audio Models (LAMs). Fish Audio’s release […]

Qwen Researchers Release Qwen3-TTS: an Open Multilingual TTS Suite with Real-Time Latency and Fine-Grained Voice Control
  • AI
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Audio Language Model
  • Editors Pick
  • Language Model
  • Large Language Model
  • New Releases
  • Sound
  • Staff
  • Tech News
  • Technology
  • TTS
  • Voice AI

Qwen Researchers Release Qwen3-TTS: an Open Multilingual TTS Suite with Real-Time Latency and Fine-Grained Voice Control

  • 0

Alibaba Cloud’s Qwen team has open-sourced Qwen3-TTS, a family of multilingual text-to-speech models that target three core tasks in one […]

Microsoft Releases VibeVoice-ASR: A Unified Speech-to-Text Model Designed to Handle 60-Minute Long-Form Audio in a Single Pass
  • agentic AI
  • AI
  • AI Agents
  • Artificial Intelligence
  • Audio Language Model
  • Editors Pick
  • Language Model
  • New Releases
  • Sound
  • Staff
  • Technology
  • Voice AI

Microsoft Releases VibeVoice-ASR: A Unified Speech-to-Text Model Designed to Handle 60-Minute Long-Form Audio in a Single Pass

  • 0

Microsoft has released VibeVoice-ASR as part of the VibeVoice family of open source frontier voice AI models. VibeVoice-ASR is described […]

FlashLabs Researchers Release Chroma 1.0: A 4B Real Time Speech Dialogue Model With Personalized Voice Cloning
  • agentic AI
  • AI
  • AI Agents
  • AI Paper Summary
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Audio Language Model
  • Editors Pick
  • Language Model
  • Large Language Model
  • Machine Learning
  • New Releases
  • Open Source
  • Software Engineering
  • Sound
  • Staff
  • Tech News
  • Technology

FlashLabs Researchers Release Chroma 1.0: A 4B Real Time Speech Dialogue Model With Personalized Voice Cloning

  • 0

Chroma 1.0 is a real time speech to speech dialogue model that takes audio as input and returns audio as […]

Inworld AI Releases TTS-1.5 For Realtime, Production Grade Voice Agents
  • AI
  • Artificial Intelligence
  • Audio Language Model
  • Editors Pick
  • Language Model
  • New Releases
  • Sound
  • Staff
  • Technology
  • TTS

Inworld AI Releases TTS-1.5 For Realtime, Production Grade Voice Agents

  • 0

Inworld AI has introduced Inworld TTS-1.5, an upgrade to its TTS-1 family that targets realtime voice agents with strict constraints […]

How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS
  • agentic AI
  • AI
  • Artificial Intelligence
  • Audio Language Model
  • Editors Pick
  • Language Model
  • Sound
  • Staff
  • Technology
  • TTS
  • Tutorials

How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS

  • 0

In this tutorial, we build an end-to-end streaming voice agent that mirrors how modern low-latency conversational systems operate in real […]

NVIDIA Releases PersonaPlex-7B-v1: A Real-Time Speech-to-Speech Model Designed for Natural and Full-Duplex Conversations
  • agentic AI
  • AI
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Audio Language Model
  • Editors Pick
  • Language Model
  • Large Language Model
  • New Releases
  • Open Source
  • Sound
  • Staff
  • Tech News
  • Technology
  • TTS

NVIDIA Releases PersonaPlex-7B-v1: A Real-Time Speech-to-Speech Model Designed for Natural and Full-Duplex Conversations

  • 0

NVIDIA Researchers released PersonaPlex-7B-v1, a full duplex speech to speech conversational model that targets natural voice interactions with precise persona […]

Google Researchers Release Magenta RealTime: An Open-Weight Model for Real-Time AI Music Generation
  • AI
  • Artificial Intelligence
  • Editors Pick
  • New Releases
  • Open Source
  • Sound
  • Staff
  • Technology

Google Researchers Release Magenta RealTime: An Open-Weight Model for Real-Time AI Music Generation

  • 0

Google’s Magenta team has introduced Magenta RealTime (Magenta RT), an open-weight, real-time music generation model that brings unprecedented interactivity to […]

Omni-R1: Advancing Audio Question Answering with Text-Driven Reinforcement Learning and Auto-Generated Data
  • AI
  • Artificial Intelligence
  • Editors Pick
  • New Releases
  • Sound
  • Speech Recognition
  • Staff
  • Technology

Omni-R1: Advancing Audio Question Answering with Text-Driven Reinforcement Learning and Auto-Generated Data

  • 0

Recent developments have shown that RL can significantly enhance the reasoning abilities of LLMs. Building on this progress, the study […]

Posts pagination

1 2 Next
  • Privacy Policy
  • Terms of use
Theme: Terminal News By Adore Themes.