Transpose Convolution Operation
Last updated
Last updated
This is equivalent to the reverse of convolution, hence the term transpose. I've already briefly talked about transpose convolution operation in the Convolution Operation section of my GitBook. I want to dive deeper into various techniques of upsampling where transpose convolution is just one of the many techniques.
Downsampling is what convolution normally does. Given an input tensor and a filter/kernel tensor, e.g. input=(5, 5, 3)
and kernel=(3, 3, 3)
, using stride=1
, the output is a (3, 3, 1)
tensor. Every filter matches with input in channel size or depth. The result is always a tensor with depth=1
. We can compute the height and width of each output tensor by a simple formula.
Therefore, if we use our example and subsitute in the values.
We take every element of an input tensor and duplicate it by a factor of K
. For example, K=4
:
We take every element of an input tensor and set them to be the corners of the output. Then we interpolate every missing elements by weighted average.
We copy every element of an input tensor to the output tensor and set everything else to zero. Each input tensor value is set to the top left corner of the expanded cell.
Max pooling takes the maximum among all values in a kernel. Max unpooling performs the opposite but it requires information from the previous max pooling layer to know what was the original index localization of each max element.
First, max pooling performs the following.
We keep track of the original position of the max elements. After some layers later, we perform unpooling using those positional information. We fill the rest using zeros.
Suppose we have an input
We have a kernel that is trainable. Backpropagation computes derivatives of kernel respect to loss. For now, let's assume the kernel is initialized to some integers =5
for the ease of demonstration.
The expected output has (3, 3)
shape.
Now we take an element of the input and multiply it to every element of kernel to produce a partially filled output. We do this to every element of the input.
Then we sum all of them to produce the final output of a transpose convolution operation.
Assuming 0 padding and unit stride size, we have for both inputs and kernels, , and .