Full Stack Speech Processing with Wav LM: a Large-Scale Self-Supervised Pre-Training (Paper Summary)

Опубликовано: 28 Октябрь 2024
на канале: WSMatrix

523

Hello WS Matrix!

In this video, we're going to talk about WavLM, a new state-of-the-art speech processing model at the time of the published paper. WavLM is trained on a massive dataset of 94k hours of audio, and it can be used for a variety of tasks, including speech recognition, natural language understanding, and machine translation.

WavLM is based on the Transformer architecture, which is a neural network architecture that has been shown to be very effective for natural language processing tasks. WavLM adds a number of new features to the Transformer architecture, including

https://arxiv.org/abs/2110.13900