Meta AI Proposes Multi-Token Attention (MTA): A New Attention Method which Allows LLMs to Condition their Attention Weights on Multiple Query and Key Vectors

Large Language Models (LLMs) significantly benefit from attention mechanisms, enabling the effective retrieval of contextual information. Nevertheless, traditional attention methods […]

DeltaProduct: An AI Method that Balances Expressivity and Efficiency of the Recurrence Computation, Improving State-Tracking in Linear Recurrent Neural Networks

The Transformer architecture revolutionised natural language processing with its self-attention mechanism, enabling parallel computation and effective context retrieval. However, Transformers […]