Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context

Training large language models on long sequences has a well-known problem: attention is expensive. The scaled dot-product attention (SDPA) at […]

Ant Group Releases Ling 2.0: A Reasoning-First MoE Language Model Series Built on the Principle that Each Activation Enhances Reasoning Capability

How do you build a language model that grows in capacity but keeps the computation for each token almost unchanged? The […]