Google has open-sourced a Model Context Protocol (MCP) server that exposes read-only access to the Google Ads API for agentic […]
Category: New Releases
Microsoft Research Releases Skala: a Deep-Learning Exchange–Correlation Functional Targeting Hybrid-Level Accuracy at Semi-Local Cost
TL;DR: Skala is a deep-learning exchange–correlation functional for Kohn–Sham Density Functional Theory (DFT) that targets hybrid-level accuracy at semi-local cost, […]
Tiny Recursive Model (TRM): A Tiny 7M Model that Surpass DeepSeek-R1, Gemini 2.5 pro, and o3-mini at Reasoning on both ARG-AGI 1 and ARC-AGI 2
Can an iterative draft–revise solver that repeatedly updates a latent scratchpad outperform far larger autoregressive LLMs on ARC-AGI? Samsung SAIT […]
Stanford Researchers Released AgentFlow: In-the-Flow Reinforcement Learning RL for Modular, Tool-Using AI Agents
TL;DR: AgentFlow is a trainable agent framework with four modules—Planner, Executor, Verifier, Generator—coordinated by an explicit memory and toolset. The […]
Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios
How do you audit frontier LLMs for misaligned behavior in realistic multi-turn, tool-use settings—at scale and beyond coarse aggregate scores? […]
Google AI Introduces Agent Payments Protocol (AP2): An Open Protocol for Interoperable AI Agent Checkout Across Merchants and Wallets
Your shopping agent auto-purchases a $499 Pro plan instead of the $49 Basic tier—who’s on the hook: the user, the […]
Google AI Ships TimesFM-2.5: Smaller, Longer-Context Foundation Model That Now Leads GIFT-Eval (Zero-Shot Forecasting)
Google Research has released TimesFM-2.5, a 200M-parameter, decoder-only time-series foundation model with a 16K context length and native probabilistic forecasting […]
Stanford Researchers Introduced MedAgentBench: A Real-World Benchmark for Healthcare AI Agents
A team of Stanford University researchers have released MedAgentBench, a new benchmark suite designed to evaluate large language model (LLM) […]
MoonshotAI Released Checkpoint-Engine: A Simple Middleware to Update Model Weights in LLM Inference Engines, Effective for Reinforcement Learning
MoonshotAI has open-sourced checkpoint-engine, a lightweight middleware aimed at solving one of the key bottlenecks in large language model (LLM) […]
OpenAI Introduces GPT-5-Codex: An Advanced Version of GPT-5 Further Optimized for Agentic Coding in Codex
OpenAI has just released GPT-5-Codex, a version of GPT-5 further optimized for “agentic coding” tasks within the Codex ecosystem. The […]
