Reinforcement Learning from Human Feedback

RLHF is a technique used to fine-tune certain generative models based on a secondary model that is trained to estimate a measure of perceived quality of the output of the generative model. Human raters provide labels to train the secondary model, and the predictions of this model are used as a guide to fine-tune the generative model in the context of a Reinforcement Learning algorithm.
Related concepts:
Reinforcement LearningModel Fine-Tuning
External reference:
https://arxiv.org/abs/2203.02155