Audio clips were turned into spectrograms and heavily masked to make a non-trivial training task for an auto-encoder.
Meta’s masked auto-encoder for audio was trained on heavily masked data, and was then able to reconstruct audio files with impressive fidelity.
Learn more: https://spectrum.ieee.org/unsupervise...