UCSD and Together AI Research Introduces Parcae: A Stable Architecture for Looped Language Models That Achieves the Quality of a Transformer Twice the Size

The dominant recipe for building better language models has not changed much since the Chinchilla era: spend more FLOPs, add […]