Speaker Diarization

Speaker Diarization is the task of finding out 'who speaks when' in a recorded or live-streamed conversation. That is, a diarization algorithm outputs IDs of speakers along with time intervals when each of them speak. There are many algorithms for diarization, some based purely on audio, some based on audio and the non-diarized transcript, some based on video, and so on. In general, speaker identification (finding out the actual identity of the speaker) is not necessary, since the diarization algorithm can simply define and use a consistent ID for the same speaker throughout the conversation.