NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing

Training a family of large language models (LLMs) has always come with a painful multiplier: every model variant in the […]

Zyphra Introduces Tensor and Sequence Parallelism (TSP): A Hardware-Aware Training and Inference Strategy That Delivers 2.6x Throughput Over Matched TP+SP Baselines

Training and serving large transformer models at scale is fundamentally a memory management problem. Every GPU in a cluster has […]