Alibaba’s Qwen team has released Qwen3.7-Plus. The model is now available through Alibaba Cloud’s Bailian platform. Bailian is the console […]
Category: Staff
JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines
JetBrains released Mellum2, open-sourcing the weights under the Apache 2.0 license. The first version of Mellum was a completion-focused 4B […]
How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp
In this tutorial, we work through an implementation of NVIDIA Apex, focusing on the components that still matter in modern […]
MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding
MiniMax officially released MiniMax M3 on June 1, 2026. The model introduces MSA (MiniMax Sparse Attention), a new sparse attention […]
Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent
Hermes Agent already remembers across sessions. The open-source agent from Nous Research ships with curated memory files and full-text session […]
Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch
The Transformer’s attention mechanism has barely changed since 2017. Most efficiency work has tried to replace softmax attention outright. A […]
An Implementation of the Microsoft Agent Governance Toolkit for Safe AI Agent Tool Use with Policies, Approvals, Audit Logs, and Risk Controls
In this tutorial, we build a governed AI-agent workflow using Microsoft’s Agent Governance Toolkit as the reference point. We create […]
A Coding Implementation on Loguru for Designing Robust, Structured, Concurrent, and Production-Ready Python Logging Pipelines
In this tutorial, we implement a practical use case with Loguru, a powerful, flexible, and production-ready logging library for Python. […]
Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain
Trajectory’s concurrent multi-LoRA stack reports a 2.81× experiment-throughput gain over single-tenant RL, with all code in the NovaSky-AI/SkyRL GitHub repository. […]
Build Skill-Augmented AI Agents with SkillNet for Search, Evaluation, Graph Analysis, and Task Planning
In this tutorial, we implement a SkillNet use case as a practical framework for discovering, installing, inspecting, evaluating, and organizing […]
