Last night my bot, Sam, was driving my car. By the way, bot sam is a system trained for driving a car automatically because you are too lazy to do it. Suddenly a truck came, and we had a really bad accident. Unfortunately, I died.

When my soul asked the robot what really went wrong, he said he was not able to notice the truck.

Can you guess what really went wrong?

I am just kidding.

We are damn sure about one thing, we don’t want Sam to do this again. Well to understand this we need to understand one more concept called the receptive field.

In the perspective of the human, the receptive field indicates how wider you can see. For example, if we are standing on a road, we can see all the vehicles that are coming. And we are seeing the same road from our room. We can notice a lot of things like vehicles on the road, nearby shops, people who are standing around stalls and footpaths, and even the sky.

So, we can say, when we were standing on the road our receptive field was not that high, we could just notice a particular area, and if we want to notice other things, we need to move our neck.

And in the room our receptive field was high, we could notice the details much more without moving our neck.

Now let us assume it with the perspective of neural networks

In the above image, we can see a 3x3 kernelis extracting the features from the top left corner of the image, and when wemove ahead we have the features of that part. Kernel moves ahead and does itfor the whole image. We can see that at a time the kernel is having thefeatures of just 3x3 pixels of the image.

And in the final output, the top-left pixelof the final output just has the idea about the top-left 3x3 pixels, not aboutthe whole image.

But at this time, we tried to run the 3x3kernel on the output feature map of the previous layer and we can see theoutput is having the whole image.

Can we notice something here, as move aheadin the network the receptive field keeps increasing.

So, we can say, in starting the kerneltries to extract the small features like edges and gradients and only has theidea about those but as soon as it moves to the next layer, it starts havingthe bigger perspective of the image and notices the pattern and textures. Andin the final layer, our network is able to build the complete object.

Can we connect it to real life?

Assume we are going to watch a movie in thetheatre, what would be our choice for selecting the seat. We will always choosethe last seat since we can see the movie without any trouble and move ournecks.

But unfortunately, their workers cannot seepeople happy, and they give us the first row, in this case, we will be movingour neck continuously.

We can assume these rows and the layers ofthe neural network, the first row means the first layer, does not have thewhole idea of the screen means the input image, but the last row means the lastlayer, has the whole idea of the image without any problem.

How to calculate the receptive field?

Source for the above diagram, here we can see the firstlayer is having a kernel of 3x3 moving from the input image, in this case, wetalk just one layer, we actually talk about the local receptive field, since ata time, our kernel can just see the pixels of 3x3, out local receptive field,becomes 3x3.

But as soon as we move to the next layer, wecan see the output feature map has the idea of every 3x3 pixel of the image,and now another kernel of 3x3 moving through this feature map, and has the ideaof all these pixels.

Can we calculate the receptive field now?

Basically, the local receptive of both thelayers are 3 since the kernel can only see 3x3 pixels at a time, so we can concludethat the receptive field directly depends on the kernel size, yes, thereceptive also depends on the strides but we will talk about it at some otherday. Right now, we are assuming the stride is 1.

But the global receptive field of thisnetwork will be 5 at the end.

For calculating the global receptive field,we need to consider two things, what is the local receptive field, and thenwhat is the kernel size?

So, basically, when we try to calculate theglobal receptive field we add KernelSize-1 in the previous global receptivefield. The below Image will solve the mystery.

As we can see each time the local receptivefield is not changing since we can use the same kernel and the global receptivefield is increasing by kernel-1, but we should always remember in the case ofreceptive field strides and jump also matters. We will understand those thingsin types of convolution articles where we will learn about the jumpingparameters and dilated convolution.

Now comes the question, why do we even needto learn how to calculate the receptive field?

The use of receptive field

When we talk about convolution neural networks,we need to use the feature extraction layers, in order to have a better featureextraction, we try to put more and more convolution layers blindly and end upwith a really heavy network. We are not sure about whether this network willgive us expected results or not, but will definitely waste our time.

So, here comes the concept of receptivefields to save us.

The idea is to calculate the receptivefield of every layer and keep the count. Our target should be to match thereceptive field with image size or in some special cases even more than ourimage size when we have reached the final convolutional layer.

This rule makes sure that we are not makinga network that has uselessly extra layers, which is making it heavy. Also bythis, we are making sure that the final Conv layer has seen the complete imageand is holding different information about it.

Also, the receptive field decides when weshould apply the pooling layers.

Basically, the concept is simple: if theimage size is small like 32x32 or 28x28we should add the pooling layers when the receptive field is 5x5 andfollow the same for the whole network. In the case of bigger sizes like 64x64,apply pooling layers after the receptive field of 7x7 or 9x9. And in really bigimages like image size above 250, we can go for 11x11 or 13x13 receptive fieldsand then pooling layers.

Okay!!! But what is the reason?

Consider this cute cat, in the above image,we can see one thing, it is holding the edges and gradient in the receptivefield of 9x9, after that it will start holding the pattern of the cat’sforehead.

This image tells us how the receptive fieldis helping us to design a network.

Connect with me