Variational Autoencoder

from IPython import display

import glob
import imageio
import matplotlib.pyplot as plt
import numpy as np
import PIL
import tensorflow as tf
import tensorflow_probability as tfp
import time
2022-12-27 16:35:18.315172: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-27 16:35:18.416397: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-12-27 16:35:18.930708: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-27 16:35:18.930780: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-27 16:35:18.930787: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

MNIST

We will model each pixel with a Bernoulli distribution. Each pixel may take a value either 0 or 1.

def preprocess_images(images):
    images = images.reshape((images.shape[0], 28, 28, 1)) / 255.
    return np.where(images > .5, 1.0, 0.0).astype('float32')


(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()
train_images = preprocess_images(train_images)
test_images = preprocess_images(test_images)


print(train_images.shape, train_images.dtype)
print(test_images.dtype)

Encoder

The encoder network approximates the posterior distribution q(zx)q(z\mid x). It inputs an observation and outputs set of parameters for specifying the conditional distribution of the latent representation of zz. In this example, we model posterior distribution as a diagonal Gaussian. The network outputs the mean and log-variance parameters of a factorized Gaussian. The log-variance provides numerical stability.

σlog=log(σ2)=2log(σ)\sigma_{log} = log(\sigma^2) = 2*log(\sigma)

Thus,

σ=exp(σlog2)\sigma = exp\left(\frac{\sigma_{log}}{2}\right)

Decoder

The decoder network approximates the conditional distribution p(xz)p(x \mid z). It inputs a latent variable and outputs the parameters for a conditional distribution of the observation. It models the latent distribution prior p(z)p(z) as an unit Gaussian.

Reparameterization Trick

We could sample zz from a Gaussian function but this operation creates a bottleneck because backpropagation cannot flow through a random node. We need a Gaussian function that is independent of variables we wish to backpropagate on.

We will use a unit Gaussian operation and define zz as follows.

z=μ+σϵz = \mu + \sigma \cdot \epsilon

μ\mu and σ\sigma can be derived from the decoder output. The ϵ\epsilon is some random noise used to maintain stochasticity of zz. We get ϵ\epsilon fro ma standard unit Gaussian.

Model

Loss

VAE is trained by maximizing the evidence lower bound on the marginal log-likelihood.

log  p(x)ELBO=Eq(zx)[logp(x,z)q(zx)]log\;p(x) \geq \text{ELBO} = \mathbb{E}_{q(z\mid x)} \left[ log\frac{p(x, z)}{q(z \mid x)} \right]

In practice, optimize the single sample Monte Carlo estimate of this expectation.

logp(x,z)q(zx)=log  p(xz)+log  p(z)log  q(zx)log\frac{p(x, z)}{q(z \mid x)} = log\;p(x \mid z) + log\;p(z) - log\;q(z \mid x)

where zz is sampled from q(zx)q(z \mid x).

Train

png
png
png

Visualize Training Progression

Visualize Latent Space

We defined our latent dimension to be 2, which means are we can simply visualize the z vectors as a 2D image. Now we choose a latent space region. We generate a Gaussian and create z vectors from it.

The quantile function returns value of the random variable X such that the probability of the variable being less than or equal to that value equals the given probability.

Prob(Xvalue)=pProb(X \leq value) = p

For example, norm.quantile(0.1) is equivalent to say find me the x value that has probability of 0.1 that my sample from norm will be less than or equal to x value.

png
png

Last updated