Bag-of-Words Model

The Bag-of-Words model is a way of representing any text document as a vector of fixed length, based on word occurence. Given a vocabulary V of size N, the bag-of-words representation of a document D is a vector of length N, where the entry at a given index is 1 if the word corresponding to that index appears in D, and 0 otherwise. There are some variations, such as counting the number of times each word appears.
Related concepts:
tf-idf