Grouped-Query Attention
Grouped-query attention is a variation of multi-head attention where, instead of learning a different triplet query/key/value per attention head, a smaller number of key/value pairs are learned and shared across groups of attention heads -- while queries are learned independently as in the original multi-head attention.