How do you combine SigLIP2, DINOv3, and SAM3 into a single vision backbone without sacrificing dense or segmentation performance? NVIDIA’s […]
Category: Computer Vision
Waymo Introduces the Waymo World Model: A New Frontier Simulator Model for Autonomous Driving and Built on Top of Genie 3
Waymo is introducing the Waymo World Model, a frontier generative model that drives its next generation of autonomous driving simulation. […]
Google Introduces Agentic Vision in Gemini 3 Flash for Active Image Understanding
Frontier multimodal models usually process an image in a single pass. If they miss a serial number on a chip […]
Salesforce AI Introduces FOFPred: A Language-Driven Future Optical Flow Prediction Framework that Enables Improved Robot Control and Video Generation
Salesforce AI research team present FOFPred, a language driven future optical flow prediction framework that connects large vision language models […]
Black Forest Labs Releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence
Black Forest Labs releases FLUX.2 [klein], a compact image model family that targets interactive visual intelligence on consumer hardware. FLUX.2 […]
Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input
Thinking Machines Lab has moved its Tinker training API into general availability and added 3 major capabilities, support for the […]
Zhipu AI Releases GLM-4.6V: A 128K Context Vision Language Model with Native Tool Calling
Zhipu AI has open sourced the GLM-4.6V series as a pair of vision language models that treat images, video and […]
Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines
Black Forest Labs has released FLUX.2, its second generation image generation and editing system. FLUX.2 targets real world creative workflows […]
Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos
How do you reliably find, segment and track every instance of any concept across large image and video collections using […]
Why Spatial Supersensing is Emerging as the Core Capability for Multimodal AI Systems?
Even strong ‘long-context’ AI models fail badly when they must track objects and counts over long, messy video streams, so […]
