Decoding As Dynamic Programming For Recurrent Autoregressive Models

Najam Zaidi; Trevor Cohn; Gholamreza Haffari

Decoding As Dynamic Programming For Recurrent Autoregressive Models

Najam Zaidi, Trevor Cohn, Gholamreza Haffari

Keywords: autoregressive models, generation

Abstract Paper Reviews Chat

Tues Session 1 (05:00-07:00 GMT) [Live QA] [Cal]

Tues Session 2 (08:00-10:00 GMT) [Live QA] [Cal]

Abstract: Decoding in autoregressive models (ARMs) consists of searching for a high scoring output sequence under the trained model. Standard decoding methods, based on unidirectional greedy algorithm or beam search, are suboptimal due to error propagation and myopic decisions which do not account for future steps in the generation process. In this paper we present a novel decoding approach based on the method of auxiliary coordinates (Carreira-Perpinan & Wang, 2014) to address the aforementioned shortcomings. Our method introduces discrete variables for output tokens, and auxiliary continuous variables representing the states of the underlying ARM. The auxiliary variables lead to a factor graph approximation of the ARM, whose maximum a posteriori (MAP) inference is found exactly using dynamic programming. The MAP inference is then used to recreate an improved factor graph approximation of the ARM via updated auxiliary variables. We then extend our approach to decode in an ensemble of ARMs, possibly with different generation orders, which is out of reach for the standard unidirectional decoding algorithms. Experiments on the text infilling task over SWAG and Daily Dialogue datasets show that our decoding method is superior to strong unidirectional decoding baselines.

Decoding As Dynamic Programming For Recurrent Autoregressive Models

Najam Zaidi, Trevor Cohn, Gholamreza Haffari

Similar Papers

The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, Yejin Choi,

Neural Text Generation With Unlikelihood Training

Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston,

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Chunting Zhou, Jiatao Gu, Graham Neubig,

Functional Regularisation for Continual Learning with Gaussian Processes

Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews, Razvan Pascanu, Yee Whye Teh,