Machine Learning Notebook
  • Introduction
  • Supervised Learning
    • Basic Overview
      • Numpy Basics
      • Loss Functions
      • Evaluation Metrics
    • Convolutional Neural Network
      • Convolution Operation
      • Transpose Convolution Operation
      • Batch Normalization
      • Weight Initialization
      • Segmentation
    • Diffusion
      • KL Divergence
      • Variational Inference
      • Variational Autoencoder
      • Stable Diffusion Overview
      • Stable Diffusion Deep Dive
    • Naive Bayes
    • Decision Tree
      • Random Forest
      • Gradient Boosting
    • Natural Language Processing
      • Word2Vec
    • Search
      • Nearest Neighbor Search
    • Recommender
      • Singular Value Decomposition
      • Low Rank Matrix Factorization
      • Neural Collaborative Filtering
      • Sampling Bias Corrected Neural Modeling for Large Corpus Item Recommendations
      • Real-time Personalization using Embeddings for Search Ranking
      • Wide and Deep Learning for Recommender Systems
    • Recurrent Neural Network
      • Vanilla Recurrent Neural Network
      • LSTM Recurrent Neural Network
  • Unsupervised Learning
    • Clustering
      • Spectral Clustering
    • Reinforcement Learning
      • Deep Q Learning
      • Policy Gradients
  • SageMaker
    • Population Segmentation with PCA and KMeans
    • Fraud Detection with Linear Learner
    • Time Series Forecast with DeepAR
    • PyTorch Non-linear Classifier
Powered by GitBook
On this page
  • Downsample via Convolution
  • Upsample Techniques
  • K-Nearest Neighbors
  • Bi-Linear Interpolation
  • Bed of Nails
  • Max-Unpooling
  • Upsample via Transpose Convolution
  1. Supervised Learning
  2. Convolutional Neural Network

Transpose Convolution Operation

PreviousConvolution OperationNextBatch Normalization

Last updated 3 years ago

This is equivalent to the reverse of convolution, hence the term transpose. I've already briefly talked about transpose convolution operation in the Convolution Operation section of my GitBook. I want to dive deeper into various techniques of upsampling where transpose convolution is just one of the many techniques.

Downsample via Convolution

Downsampling is what convolution normally does. Given an input tensor and a filter/kernel tensor, e.g. input=(5, 5, 3) and kernel=(3, 3, 3), using stride=1, the output is a (3, 3, 1) tensor. Every filter matches with input in channel size or depth. The result is always a tensor with depth=1. We can compute the height and width of each output tensor by a simple formula.

W: width  H: height  P: padding  S: strideW\text{: width}\; H\text{: height}\; P\text{: padding}\; S\text{: stride}W: widthH: heightP: paddingS: stride
Woutput=1+Winput−Wkernel+2PSW_{\text{output}} = 1 + \frac{W_{\text{input}} - W_{\text{kernel}} + 2P}{S}Woutput​=1+SWinput​−Wkernel​+2P​
Houtput=1+Hinput−Hkernel+2PSH_{\text{output}} = 1 + \frac{H_{\text{input}} - H_{\text{kernel}} + 2P}{S}Houtput​=1+SHinput​−Hkernel​+2P​

Therefore, if we use our example and subsitute in the values.

Woutput=Houtput=1+5−3+2∗01=3W_{\text{output}} = H_{\text{output}} = 1 + \frac{5 - 3 + 2*0}{1} = 3Woutput​=Houtput​=1+15−3+2∗0​=3

Upsample Techniques

K-Nearest Neighbors

We take every element of an input tensor and duplicate it by a factor of K. For example, K=4:

input=[1234]↦output=[1122112233443344]\text{input} = \begin{bmatrix} 1 & 2\\ 3 & 4 \end{bmatrix} \mapsto \text{output} = \begin{bmatrix} 1 & 1 & 2 & 2 \\ 1 & 1 & 2 & 2 \\ 3 & 3 & 4 & 4 \\ 3 & 3 & 4 & 4 \end{bmatrix}input=[13​24​]↦output=​1133​1133​2244​2244​​

Bi-Linear Interpolation

We take every element of an input tensor and set them to be the corners of the output. Then we interpolate every missing elements by weighted average.

input=[10203040]↦output=[10121720151722252527323530323740]\text{input} = \begin{bmatrix} 10 & 20\\ 30 & 40 \end{bmatrix} \mapsto \text{output} = \begin{bmatrix} 10 & 12 & 17 & 20 \\ 15 & 17 & 22 & 25 \\ 25 & 27 & 32 & 35 \\ 30 & 32 & 37 & 40 \end{bmatrix}input=[1030​2040​]↦output=​10152530​12172732​17223237​20253540​​

Bed of Nails

We copy every element of an input tensor to the output tensor and set everything else to zero. Each input tensor value is set to the top left corner of the expanded cell.

input=[1234]↦output=[1020000030400000]\text{input} = \begin{bmatrix} 1 & 2\\ 3 & 4 \end{bmatrix} \mapsto \text{output} = \begin{bmatrix} 1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 \\ 3 & 0 & 4 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}input=[13​24​]↦output=​1030​0000​2040​0000​​

Max-Unpooling

Max pooling takes the maximum among all values in a kernel. Max unpooling performs the opposite but it requires information from the previous max pooling layer to know what was the original index localization of each max element.

First, max pooling performs the following.

[1241342314122343]↦max[4444]↦pool[4444]\begin{bmatrix} 1 & 2 & 4 & 1 \\ 3 & 4 & 2 & 3 \\ 1 & 4 & 1 & 2 \\ 2 & 3 & 4 & 3 \end{bmatrix} \mapsto_{max} \begin{bmatrix} & & 4 & \\ & 4 & & \\ & 4 & & \\ & & 4 & \end{bmatrix} \mapsto_{pool} \begin{bmatrix} 4 & 4\\ 4 & 4 \end{bmatrix}​1312​2443​4214​1323​​↦max​​​44​44​​​↦pool​[44​44​]

We keep track of the original position of the max elements. After some layers later, we perform unpooling using those positional information. We fill the rest using zeros.

[1234]↦max−unpool[0020010003000040]\begin{bmatrix} 1 & 2\\ 3 & 4 \end{bmatrix} \mapsto_{max-unpool} \begin{bmatrix} 0 & 0 & 2 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 3 & 0 & 0 \\ 0 & 0 & 4 & 0 \end{bmatrix}[13​24​]↦max−unpool​​0000​0130​2004​0000​​

Upsample via Transpose Convolution

Suppose we have an input

input=[1234]\text{input} = \begin{bmatrix} 1 & 2\\ 3 & 4 \end{bmatrix}input=[13​24​]

We have a kernel that is trainable. Backpropagation computes derivatives of kernel respect to loss. For now, let's assume the kernel is initialized to some integers =5 for the ease of demonstration.

kernel=[5555]\text{kernel} = \begin{bmatrix} 5 & 5\\ 5 & 5 \end{bmatrix}kernel=[55​55​]

Assuming 0 padding and unit stride size, we have W=H=2W = H = 2W=H=2 for both inputs and kernels, P=0P = 0P=0, and S=1S = 1S=1.

Woutput=Houtput=(Winput−1)∗S−2P+Wkernel=(2−1)∗1−2∗0+2=3W_{output} = H_{output} = (W_{input} - 1) * S - 2P + W_{kernel} = (2 - 1) * 1 - 2*0 + 2 = 3Woutput​=Houtput​=(Winput​−1)∗S−2P+Wkernel​=(2−1)∗1−2∗0+2=3

The expected output has (3, 3) shape.

output=[?????????]\text{output} = \begin{bmatrix} ? & ? & ?\\ ? & ? & ?\\ ? & ? & ? \end{bmatrix}output=​???​???​???​​

Now we take an element of the input and multiply it to every element of kernel to produce a partially filled output. We do this to every element of the input.

[1][5555]=[5555]\begin{bmatrix} 1 & \\ & \end{bmatrix} \begin{bmatrix} 5 & 5\\ 5 & 5 \end{bmatrix} = \begin{bmatrix} 5 & 5 & \\ 5 & 5 & \\ & & \end{bmatrix}[1​​][55​55​]=​55​55​​​
[2][5555]=[10101010]\begin{bmatrix} & 2 \\ & \end{bmatrix} \begin{bmatrix} 5 & 5\\ 5 & 5 \end{bmatrix} = \begin{bmatrix} & 10 & 10\\ & 10 & 10\\ & & \end{bmatrix}[​2​][55​55​]=​​1010​1010​​
[3][5555]=[15151515]\begin{bmatrix} & \\ 3 & \end{bmatrix} \begin{bmatrix} 5 & 5\\ 5 & 5 \end{bmatrix} = \begin{bmatrix} & & \\ 15 & 15 &\\ 15 & 15 & \end{bmatrix}[3​​][55​55​]=​1515​1515​​​
[4][5555]=[20202020]\begin{bmatrix} & \\ & 4 \end{bmatrix} \begin{bmatrix} 5 & 5\\ 5 & 5 \end{bmatrix} = \begin{bmatrix} & & \\ & 20 & 20\\ & 20 & 20 \end{bmatrix}[​4​][55​55​]=​​2020​2020​​

Then we sum all of them to produce the final output of a transpose convolution operation.

[5555]+[10101010]+[15151515]+[20202020]=[51510205030153520]\begin{bmatrix} 5 & 5 & \\ 5 & 5 & \\ & & \end{bmatrix} + \begin{bmatrix} & 10 & 10\\ & 10 & 10\\ & & \end{bmatrix} + \begin{bmatrix} & & \\ 15 & 15 &\\ 15 & 15 & \end{bmatrix} + \begin{bmatrix} & & \\ & 20 & 20\\ & 20 & 20 \end{bmatrix} = \begin{bmatrix} 5 & 15 & 10\\ 20 & 50 & 30\\ 15 & 35 & 20 \end{bmatrix}​55​55​​​+​​1010​1010​​+​1515​1515​​​+​​2020​2020​​=​52015​155035​103020​​
Transpose Convolution
Convolution 3D