Skip to content
Saturday, May 30, 2026
The TechBriefs
  • Home
  • Technology
  • AI
  • Computers
  • Security
  • Internet
  • Press Releases
    • GlobeNewswire
    • PRNewswire
  • Contact

Category: AI Shorts

  • Home
  • AI Shorts
StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows
  • agentic AI
  • AI
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Editors Pick
  • Language Model
  • Large Language Model
  • Machine Learning
  • New Releases
  • Open Source
  • Software Engineering
  • Staff
  • Tech News
  • Technology
  • Vision Language Model

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

  • 0

StepFun today released Step 3.7 Flash, a multimodal Mixture-of-Experts model targeting agentic use cases. It adds native vision input and […]

Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication
  • AI
  • AI infrastructure
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Editors Pick
  • Machine Learning
  • New Releases
  • Open Source
  • Software Engineering
  • Staff
  • Tech News
  • Technology

Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication

  • 0

GPU communication overhead is a measurable bottleneck in production AI workloads. According to data cited by the mKernel project, communication […]

Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters
  • AI
  • AI infrastructure
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Editors Pick
  • Language Model
  • Large Language Model
  • Machine Learning
  • New Releases
  • Software Engineering
  • Staff
  • Tech News
  • Technology

Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters

  • 0

Liquid AI just shipped LFM2.5-8B-A1B. It is an on-device Mixture-of-Experts (MoE) model built for tool calling. The model holds 8.3B […]

Anthropic Ships Claude Opus 4.8 Alongside Dynamic Workflows and Cheaper Fast Mode, With Workflows Capped at 1,000 Subagents
  • agentic AI
  • AI
  • AI Agents
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Editors Pick
  • Language Model
  • Large Language Model
  • Machine Learning
  • New Releases
  • Software Engineering
  • Staff
  • Tech News
  • Technology

Anthropic Ships Claude Opus 4.8 Alongside Dynamic Workflows and Cheaper Fast Mode, With Workflows Capped at 1,000 Subagents

  • 0

Anthropic just launched Claude Opus 4.8. Also, there two Claude Code updates shipped with it. Dynamic workflows run many subagents […]

Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate
  • AI
  • AI infrastructure
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Editors Pick
  • Machine Learning
  • New Releases
  • Open Source
  • Software Engineering
  • Staff
  • Tech News
  • Technology

Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate

  • 0

Perplexity AI’s research team reimplemented their Unigram tokenizer from scratch in Rust and open-sourced the code in pplx-garden, their inference […]

Sakana AI Proposes DiffusionBlocks: a Block-wise Training Framework That Converts Residual Networks into Independently Trainable Denoising Modules
  • AI
  • AI infrastructure
  • AI Paper Summary
  • AI Shorts
  • Artificial Intelligence
  • Editors Pick
  • Language Model
  • Large Language Model
  • Machine Learning
  • New Releases
  • Staff
  • Tech News
  • Technology

Sakana AI Proposes DiffusionBlocks: a Block-wise Training Framework That Converts Residual Networks into Independently Trainable Denoising Modules

  • 0

Researchers from Sakana AI and the University of Tokyo propose DiffusionBlocks. It trains transformer-based networks one block at a time. […]

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code
  • agentic AI
  • AI
  • AI Paper Summary
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Editors Pick
  • Language Model
  • Large Language Model
  • Machine Learning
  • New Releases
  • Open Source
  • Staff
  • Tech News
  • Technology

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code

  • 0

Reinforcement learning for language agents is growing more complex. Agents now manage multi-turn tool use, long-running contexts, and multi-agent orchestration. […]

Meet EAGLE 3.1: The Speculative Decoding Algorithm That Fixes Attention Drift in LLM Inference
  • AI
  • AI infrastructure
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Editors Pick
  • For Devs
  • New Releases
  • Software Engineering
  • Staff
  • Tech News
  • Technology

Meet EAGLE 3.1: The Speculative Decoding Algorithm That Fixes Attention Drift in LLM Inference

  • 0

Speculative decoding is a technique for speeding up large language model inference. A small, fast draft model proposes several tokens. […]

MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters
  • AI
  • AI Paper Summary
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Editors Pick
  • Language Model
  • Large Language Model
  • Software Engineering
  • Staff
  • Tech News
  • Technology

MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters

  • 0

Large language models become static after pretraining. Their knowledge does not update as the world changes. Retraining a full LLM […]

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving
  • AI
  • AI infrastructure
  • AI Shorts
  • Applications
  • Artificial Intelligence
  • Editors Pick
  • Machine Learning
  • New Releases
  • Open Source
  • Software Engineering
  • Staff
  • Tech News
  • Technology

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

  • 0

Long-context inference makes the KV cache one of the main costs of serving LLMs. During autoregressive decoding, the cache grows […]

Posts pagination

1 2 … 140 Next
  • Privacy Policy
  • Terms of use
Theme: Terminal News By Adore Themes.