Last week, the NVIDIA robotics team released Jetson Thor that includes Jetson AGX Thor Developer Kit and the Jetson T5000 […]
Category: New Releases
Alibaba Qwen Team Releases Mobile-Agent-v3 and GUI-Owl: Next-Generation Multi-Agent Framework for GUI Automation
Table of contents Introduction: The Rise of GUI Agents Architecture and Core Capabilities Training and Data Pipeline Benchmarking and Performance […]
Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance
Table of contents The Problem with “Thinking Longer” The Agentic Approach Infrastructure Challenges and Solutions GRPO-RoC: Learning from High-Quality Examples […]
Microsoft AI Lab Unveils MAI-Voice-1 and MAI-1-Preview: New In-House Models for Voice AI
Microsoft AI lab officially launched MAI-Voice-1 and MAI-1-preview, marking a new phase for the company’s artificial intelligence research and development […]
OpenAI Releases an Advanced Speech-to-Speech Model and New Realtime API Capabilities including MCP Server Support, Image Input, and SIP Phone Calling Support
OpenAI has officially launched Realtime API and gpt-realtime, its most advanced speech-to-speech model, moving the Realtime API out of beta […]
Nous Research Team Releases Hermes 4: A Family of Open-Weight AI Models with Hybrid Reasoning
Nous Research has released Hermes 4, a family of open-weight models (14B, 70B, and 405B parameter sizes based on Llama […]
Google AI’s New Regression Language Model (RLM) Framework Enables LLMs to Predict Industrial System Performance Directly from Raw Text Data
Google’s new Regression Language Model (RLM) approach enables Large Language Models (LLMs) to predict industrial system performance directly from raw […]
NVIDIA AI Released Jet-Nemotron: 53x Faster Hybrid-Architecture Language Model Series that Translates to a 98% Cost Reduction for Inference at Scale
NVIDIA researchers have shattered the longstanding efficiency hurdle in large language model (LLM) inference, releasing Jet-Nemotron—a family of models (2B […]
Google AI Introduces Gemini 2.5 Flash Image: A New Model that Allows You to Generate and Edit Images by Simply Describing Them
Table of contents What Makes Gemini 2.5 Flash Image Impressive? Key Technical Features Benchmark Leadership and Community Reception Pricing, Access, […]
Microsoft Released VibeVoice-1.5B: An Open-Source Text-to-Speech Model that can Synthesize up to 90 Minutes of Speech with Four Distinct Speakers
Table of contents Key Features Architecture and Technical Deep Dive Model Limitations and Responsible Use Conclusion FAQs Microsoft’s latest open […]
