BicycleGAN : Image Translation with GAN (5)
Limitations of pix2pix, DTN, DiscoGAN & CycleGAN?
- They produce single answer.
- They are deterministic models.
- Translates an image in one-to-one
- Paired set, One-to-One : pix2pix (CVPR2017)
- Unpaired set, One-to-One : DTN (ICLR2017), CycleGAN (ICCV2017)
- Paired set, One-to-Many : ???
BicycleGAN:
Toward Multimodal Image-to-Image Translation (NIPS2017)
BicycleGAN github
Easy approach:
- Adopt stochastically sampled noise $N(z)$ to the deterministic generator
- Hope noise act as latent code to produce diverse results
However, it causes mode collapse
- Generates from multiple noises but mapped to similar outputs
- Generator do not care noise
- Generate learns to ignore random noise when conditioned on relevant context (input image)
Encoder
- Encourage bijection between the output <-> latent space
- Disturb two different latent codes to generate same output.
- Avoid mode collapse
Conditional Variational Autoencoder-GAN
- Encoder predicts Gaussian
- Encoding is trained with real data (B)
- Generator takes latent code with rich info of $B$ and input $A$.
- At test time, generated from random latent code may produce unrealistic image.
- Generator never see random noise.
- Discriminator never see samples from generated from random noise
Conditional Latent Regressor GAN
- Encoder is latent code regressor.
- Generated sample is encoded and mapped back to random noise.
- Latent code is easily and randomly sampled, as test time.
- Generator never sees ground truth $B$.
- More vulnerable to mode collapse, probably small dimension of $z$ and $L_1$ loss on $z$ and $\hat{z}$ is not enough to prevent generator easily fool discriminator?
BicycleGAN
Train both model together, with benefit of cycle-loss in $z$ and $B$.
Result
- Pix2pix + noise : similar realistic outputs
- cVAE-GAN : adds variation but artifacts caused from random sample at test
- cLR-GAN : less variant in output and sometimes mode collapse
- BicycleGAN : hybrid results both diverse and realistic
Quantative experiment
Conclusion
- Propose solution to mode collapse in the conditional generative setting
- Combine multiple objectives for encouraging a bijective mapping between the latent and output spaces
- Produce both realistic and diverse
- Latent code could be replaced with user controllable parameter in the future
Trends
- Paired set, One-to-One : pix2pix (CVPR2017)
- Unpaired set, One-to-One : DTN (ICLR2017), DiscoGAN (ICML2017), CycleGAN (ICCV2017)
- Paired set, One-to-Many : BicycleGAN (NIPS2017)
- In the future:
- Unpaired set, One-to-Many : Augmented CycleGAN (probabily ICML2018 submitted), XGAN (ICLR2018 rejected)
- Multi domains. Not only a source domain to a target domain: StarGAN (CVPR2018 accepted)
- User controllable noise vector in BicycleGAN