Dilated Convolutions for Image Classification with ResNet

Shivaraj karki
2 min readSep 14, 2020

Note: This is not Dilated ResNet (DRN), but faster and better than ResNet and DRN

Now we are going to discuss variant of ResNet using Dilation (this is not Dilated ResNet). In this case effort is been made to retain receptive field as in ResNet and expand feature resolution, with reduction in number of the parameters of the network by 94% (lesser the parameters means lesser FLOPs and faster training and testing).

http://www.mva-org.jp/Proceedings/2017USB/papers/15-11.pdf

Here I’m discussing this not so popular paper with respect to classification.

Traditional neural networks apply pooling or convolution with 2 or more stride to decrease the feature map resolution and expand the receptive field. Dilated convolution supports exponential expansion of the receptive field without loss of feature map resolution since it applies convolution with a dilation factor instead of convolution after decreasing of the feature map resolution.

Differences in the architecture of ResNet(left) and ResNet with dilated convolutions(right). Both networks have 50 layers.

Above figure show the details of architecture of ResNet and ResNet with dilated convolutions.

In the case of ResNet, down-sampling is performed by the first 1 × 1 convolution layers with stride 2 in each layer blocks, conv3 x, conv4 x and conv5 x.

In ResNet with dilated convolutions, instead of down-sampling by convolutions with stride 2 to expand receptive field, stride is set to 1 and replace 3 × 3 convolutions with 3 × 3 dilated convolutions. Dilation factors is set to 2^(N−1) in a convN x layer block.

The total amount of parameters of the networks is 23.7 M for ResNet and 1.4 M for ResNet with dilated convolutions.

Lesser the number of parameters, faster is the computation. Reduction of parameters works as regularization, avoids over-fitting and improves the accuracy to test data.

Global Average Pooling is used instead of fully connected layers. The idea is to generate one feature map for each corresponding category of the classification task in the last conv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer.

Results

ResNet with Dilation conv has lesser error compared to ResNet. ImageNet50 classification. Training curves is drawn with dotted lines and validation curves solid lines.
ResNet with Dilation conv has lesser loss compared to ResNet. ImageNet50 classification. Training curves is drawn with dotted lines and validation curves solid lines.

Thank you

--

--

Shivaraj karki

Technical Lead @HCL. I implement Deep Learning, Computer Vision models on production line.