Normalization layer
What is Batch Normalization?
Instance Normalization?
Conditional Batch Normalization?
Conditional Instance Normalization?
Batch Normalization is first introduced by Sergey Ioffe, Christian Szegedy. It increased image classification performance significantly.
Interested in "Conditional Batch Normalization (CBN)", here's wrap up of normalization
layers.
refer to "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization"
Well compared in this paper.
Batch Normalization (BN)
$$\text{BN}(x)=\gamma (\frac{x-\mu(x)}{\sigma(x)})+\beta$$
$\gamma, \beta \in \mathbb{R}^C$ are affine parameters learned from data;
mean ad standard deviation, computed across batch size and spatial dimensions indenpedently for each feature channel
$\mu_c(x)=\frac{1}{NHW}\sum\limits_{n=1}^N \sum\limits_{h=1}^H \sum\limits_{w=1}^W x_{nchw}$
$\sigma_c(x)=\sqrt{\frac{1}{NHW}\sum\limits_{n=1}^N \sum\limits_{h=1}^H \sum\limits_{w=1}^W (x_{nchw}-\mu_c(x))^2+\epsilon}$
Instance Normalization (IN)
$$\text{IN}(x)=\gamma (\frac{x-\mu(x)}{\sigma(x)})+\beta$$
$\mu_{nc}(x)=\frac{1}{HW} \sum\limits_{h=1}^H \sum\limits_{w=1}^W x_{nchw}$
$\sigma_{nc}(x)=\sqrt{\frac{1}{HW} \sum\limits_{h=1}^H \sum\limits_{w=1}^W (x_{nchw}-\mu_{nc}(x))^2+\epsilon}$
While BN
takes average among channels, IN
takes average in each channels. Thus, each channels won't be affected. Image generation is very dependent on channels compared to image classification. IN
takes very important parts in image generation. State-of-the-art such as CycleGAN, StarGAN ... uses IN
instead of BN
.
In my opinion,
BN
is good for discrminative job andIN
for generative job.
Here's difference between BN
and IN
Conditional Batch Normalization (CBN)
First instoduced from Modulating early visual processing by language.
in NIPS2017. CBN
is introduced from Auron Courville's lab.
Also predict delta value of $\mu$ and $\sigma$ on
BN
. Thus, $\mu$ and $\sigma$ ofBN
will be conditioned to some other Neural Net (questions, query, ...).
Conditional Instance Normalization (CIN)
$$\text{CIN}(x;s)=\gamma^s (\frac{x-\mu(x)}{\sigma(x)})+\beta^s, s \in {1,2,3,...,S}$$
Surprisingly, the network can generate images in completely different styles by using the same convolutional parameters but different affine parameters in IN layers.
CIN
is would be good for conditional image generation (sytle transfer for given style. Compute style and use the $\mu$ and $\sigma$ for image generation.)
Ulyanov et al. [52] attribute the success of IN
to its invariance to the contrast of the content image. However, IN
takes place in the feature space, therefore it should have more profound impacts than a simple contrast normalization in the pixel space. Perhaps even more surprising is the fact that the affine parameters in IN
can completely change the style of the output image.
Adaptive Instance Normalization (AdaIN)
AdaIN
has no learnable affine parameters. Instead, it adaptively computes the affine parameters from the style input:
$$\text{AdaIN}(x;s)=\sigma(y) (\frac{x-\mu(x)}{\sigma(x)})+\mu(y)$$
in which we simply scale the normalized content input with $\sigma(y)$, and shift it with $\mu(y)$.
In my opinion, we already use the term
Conditional Normalization
instead ofAdative Normalization
.
Applications
FiLM: Visual Reasoning with a General Conditioning Layer
by Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville
AAAI 2018. Code available at this http URL . Extends arXiv:1707.03017.
This work outperforms Deepmind's "relation network" VQA task.
Conditional Instance Normalization used from "A LEARNED REPRESENTATION FOR ARTISTIC STYLE"
Vincent Dumoulin & Jonathon Shlens & Manjunath Kudlur, Google Brain
Outputs multiple style trasfered output with a single network.
Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data
by Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, Aaron Courville
Submitted to ICML2018, arXiv:1802.10151v1
Uses CIN
for many-to-many mapping.