Salesforce AI research team present FOFPred, a language driven future optical flow prediction framework that connects large vision language models […]
Category: Computer Vision
Black Forest Labs Releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence
Black Forest Labs releases FLUX.2 [klein], a compact image model family that targets interactive visual intelligence on consumer hardware. FLUX.2 […]
Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input
Thinking Machines Lab has moved its Tinker training API into general availability and added 3 major capabilities, support for the […]
Zhipu AI Releases GLM-4.6V: A 128K Context Vision Language Model with Native Tool Calling
Zhipu AI has open sourced the GLM-4.6V series as a pair of vision language models that treat images, video and […]
Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines
Black Forest Labs has released FLUX.2, its second generation image generation and editing system. FLUX.2 targets real world creative workflows […]
Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos
How do you reliably find, segment and track every instance of any concept across large image and video collections using […]
Why Spatial Supersensing is Emerging as the Core Capability for Multimodal AI Systems?
Even strong ‘long-context’ AI models fail badly when they must track objects and counts over long, messy video streams, so […]
Zhipu AI Releases ‘Glyph’: An AI Framework for Scaling the Context Length through Visual-Text Compression
Can we render long texts as images and use a VLM to achieve 3–4× token compression, preserving accuracy while scaling […]
Salesforce AI Research Introduces WALT (Web Agents that Learn Tools): Enabling LLM agents to Automatically Discover Reusable Tools from Any Website
A team of Salesforce AI researchers introduced WALT (Web Agents that Learn Tools), a framework that reverse-engineers latent website functionality […]
UltraCUA: A Foundation Computer-Use Agents Model that Bridges the Gap between General-Purpose GUI Agents and Specialized API-based Agents
Computer-use agents have been limited to primitives. They click, they type, they scroll. Long action chains amplify grounding errors and […]
