Tokenizer
A tokenizer is a function that (mainly) splits text into tokens. Tokens can be entire words, parts of words, or special characters (such as punctuation). There are also "utility" tokens, e.g. to mark the beginning of a new paragraph. Tokenization is a common early step in NLP tasks.