Speculative decoding is a technique for speeding up large language model inference. A small, fast draft model proposes several tokens. […]
Category: Applications
MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters
Large language models become static after pretraining. Their knowledge does not update as the world changes. Retraining a full LLM […]
Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving
Long-context inference makes the KV cache one of the main costs of serving LLMs. During autoregressive decoding, the cache grows […]
Best Authentication Platforms for AI Agents and MCP Servers in 2026
The Model Context Protocol has moved from Anthropic’s internal experiment to a de facto industry standard at a speed few […]
WorkOS Releases auth.md: An Open Agent Registration Protocol Built on OAuth Standards
For years, authentication on the web followed one design assumption: a human sits behind a browser. Click a button. Fill […]
Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%
Most web agents today drive a browser one action at a time. The model receives the current page state — […]
NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule
Linear attention replaces the unbounded KV cache of softmax attention with a fixed-size recurrent state. This cuts sequence mixing to […]
Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents
Tencent has released TencentDB Agent Memory, an open-source memory system for AI agents. The project ships under the MIT license. […]
Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification
Instruction-tuned language models refuse harmful requests. But which part of the model is actually responsible — and how does that […]
Perplexity Open-Sources Bumblebee: A Read-Only Supply-Chain Scanner for Developer Endpoints
Attackers increasingly target the packages, editor extensions, and AI tool configs on developer machines and not just production systems. Perplexity […]
