Thinking Machines Lab has moved its Tinker training API into general availability and added 3 major capabilities, support for the […]
Category: Computer Vision
Zhipu AI Releases GLM-4.6V: A 128K Context Vision Language Model with Native Tool Calling
Zhipu AI has open sourced the GLM-4.6V series as a pair of vision language models that treat images, video and […]
Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines
Black Forest Labs has released FLUX.2, its second generation image generation and editing system. FLUX.2 targets real world creative workflows […]
Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos
How do you reliably find, segment and track every instance of any concept across large image and video collections using […]
Why Spatial Supersensing is Emerging as the Core Capability for Multimodal AI Systems?
Even strong ‘long-context’ AI models fail badly when they must track objects and counts over long, messy video streams, so […]
Zhipu AI Releases ‘Glyph’: An AI Framework for Scaling the Context Length through Visual-Text Compression
Can we render long texts as images and use a VLM to achieve 3–4× token compression, preserving accuracy while scaling […]
Salesforce AI Research Introduces WALT (Web Agents that Learn Tools): Enabling LLM agents to Automatically Discover Reusable Tools from Any Website
A team of Salesforce AI researchers introduced WALT (Web Agents that Learn Tools), a framework that reverse-engineers latent website functionality […]
UltraCUA: A Foundation Computer-Use Agents Model that Bridges the Gap between General-Purpose GUI Agents and Specialized API-based Agents
Computer-use agents have been limited to primitives. They click, they type, they scroll. Long action chains amplify grounding errors and […]
Google AI Introduces VISTA: A Test Time Self Improving Agent for Text to Video Generation
TLDR: VISTA is a multi agent framework that improves text to video generation during inference, it plans structured prompts as […]
NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI
How do you create 3D datasets to train AI for Robotics without expensive traditional approaches? A team of researchers from […]
