Zyphra, the San Francisco-based AI lab behind the ZAYA1 model family, released ZAYA1-8B-Diffusion-Preview — a preview of its early work […]
Category: Large Language Model
Cline Releases Cline SDK: An Open-Source Agent Runtime Now Powering Its CLI and Kanban, With IDE Extensions Being Migrated
Cline became ‘agentic’ before it was cool, but building on the bleeding edge usually leads to some structural debt. Over […]
Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models
Pre-training large language models is expensive enough that even modest efficiency improvements can translate into meaningful cost and time savings. […]
Understanding LLM Distillation Techniques
Modern large language models are no longer trained only on raw internet text. Increasingly, companies are using powerful “teacher” models […]
Meta and Stanford Researchers Propose Fast Byte Latent Transformer That Reduces Inference Memory Bandwidth by Over 50% Without Tokenization
A team of researchers from Meta, Stanford University, and the University of Washington have introduced three new methods that substantially […]
Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs
Scaling large language models (LLMs) is expensive. Every token processed during inference and every gradient computed during training flows through […]
NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing
Training a family of large language models (LLMs) has always come with a painful multiplier: every model variant in the […]
Anthropic Introduces Natural Language Autoencoders That Convert Claude’s Internal Activations Directly into Human-Readable Text Explanations
When you type a message to Claude, something invisible happens in the middle. The words you send get converted into […]
Meta AI Releases NeuralBench: A Unified Open-Source Framework to Benchmark NeuroAI Models Across 36 EEG Tasks and 94 Datasets
Evaluating AI models trained on brain signals has long been a messy, inconsistent topic. Different research groups use different preprocessing […]
Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class
Zyphra AI has released ZAYA1-8B, a small Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 […]
