vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Alexei Baevski; Steffen Schneider; Michael Auli

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Alexei Baevski, Steffen Schneider, Michael Auli

Keywords: clustering, representation learning, self supervised learning

Abstract Paper Reviews Chat

Tues Session 4 (17:00-19:00 GMT) [Live QA] [Cal]

Tues Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Abstract: We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Alexei Baevski, Steffen Schneider, Michael Auli

Similar Papers

Incorporating BERT into Neural Machine Translation

Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tieyan Liu,

CLN2INV: Learning Loop Invariants with Continuous Logic Networks

Gabriel Ryan, Justin Wong, Jianan Yao, Ronghui Gu, Suman Jana,

Strategies for Pre-training Graph Neural Networks

Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec,

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, Luo Si,