Multi-Query Attention
Multi-query attention is a variation of multi-head attention where, instead of learning a different triplet query/key/value per attention head, a single key and a single value are learned and shared across the attention heads -- i.e. only the queries are learned independently. This has the effect of increasing computational efficiency -- at the expense of some loss in accuracy.