Skip to main content

Number of Filters in CNN

Let's say we have 3 channels in input [layer] and 64 channels/feature maps in output [layer].

How are 3 channels converted to 64 channels? How many filters do we need?

To produce each / one feature map in output you will need 3 filters - one for each input channel.


R - (one filter conv) - |
G - (one filter conv) - | --> (summed after conv)--> | one output
B - (one filter conv) - |

So to generate 64 channels/feature maps in output, you'll need 3x64 filters/kernels.
Thus, simply, Number of Filters = No. of Input Channels x No. of Output Channels.
Or,
You can think the filter is not 2D at all. Say we have an input of HxWxN, where HxW is the height and width of the image and N is the depth or number of channels, so our filter will be AxBxN, so not a 2D but an n-D filter. And we do the n-D convolution (basically a dot product of each pixel, whatever the dimension). [PS: Each filter has a bias, so AxNxN + 1 = number for params for 1 output]

[Source]



Deep Learning Tutorial by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).

"From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers."