Beam Search

Beam search is an algorithm used to generate sequences from autoregressive models -- which are models that generate one sequence element (e.g. letter, word, or token) at a time. In a "greedy" search, at each iteration the element with the highest probability (conditioned to the sequence already generated) is picked. In beam search, a given number (the "beam width") of most likely candidate sequences of elements is kept in memory and used to condition the generation of the next element. Then new most likely sequences are selected based on the resulting aggregated probabilities. The process typically stops when the "end of sentence" token is reached. A beam width of 1 corresponds to greedy search.