Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling

Hao Zhang; Bo Chen; Long Tian; Zhengjue Wang; Mingyuan Zhou

Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling

Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou

Keywords: adversarial, gan, generation, generative models, image generation, text generation, zero shot learning

Abstract Paper Code Reviews Chat

Tues Session 1 (05:00-07:00 GMT) [Live QA] [Cal]

Tues Session 3 (12:00-14:00 GMT) [Live QA] [Cal]

Abstract: For bidirectional joint image-text modeling, we develop variational hetero-encoder (VHE) randomized generative adversarial network (GAN), a versatile deep generative model that integrates a probabilistic text decoder, probabilistic image encoder, and GAN into a coherent end-to-end multi-modality learning framework. VHE randomized GAN (VHE-GAN) encodes an image to decode its associated text, and feeds the variational posterior as the source of randomness into the GAN image generator. We plug three off-the-shelf modules, including a deep topic model, a ladder-structured image encoder, and StackGAN++, into VHE-GAN, which already achieves competitive performance. This further motivates the development of VHE-raster-scan-GAN that generates photo-realistic images in not only a multi-scale low-to-high-resolution manner, but also a hierarchical-semantic coarse-to-fine fashion. By capturing and relating hierarchical semantic and visual concepts with end-to-end training, VHE-raster-scan-GAN achieves state-of-the-art performance in a wide variety of image-text multi-modality learning and generation tasks.

Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling

Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou

Similar Papers

Real or Not Real, that is the Question

Yuanbo Xiangli, Yubin Deng, Bo Dai, Chen Change Loy, Dahua Lin,

Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators

Daniel Stoller, Sebastian Ewert, Simon Dixon,

RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis

Atsuhiro Noguchi, Tatsuya Harada,

High Fidelity Speech Synthesis with Adversarial Networks

Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan,