What is Convolutional Neural Networks - An Introduction

A Convolution Neural Network (CNN) is a multi layered neural network with a special architecture to detect complex features in data. It is a type of artificial neural network used in image recognition and processing that is specially designed to process pixel data.

CNNs are powerful image processing, artificial intelligence  that use deep learning to perform both generative and descriptive tasks. In this article we will see how CNN classifies images.

Convolutional Neural Networks
Convolutional Neural Networks

With CNN we can build a classifier which can correctly classify whether the given image is that of a lion or a tiger.

Lets see how CNN works:

1. Convolution 

There are three important items in this process : the input image, the feature detector and the feature map. The input is the image being detected. The feature detector is a matrix usually 3x3. It could also be 7x7. The feature detector can also be referred as kernel or filter.

The matrix representation of the input image is multiplied element-wise with the feature detector to produce a feature map. The aim of this step is to reduce the size of the image and make processing faster and easier.  Some of the features of the image are lost in this step. However the main features of the image are retained.

2. Applying ReLu (Rectified Linear Unit)

In this step we will apply ReLu (Rectified Linear Unit) to increase non-linearity in the CNN. With out applying this function the image classification will be treated as a linear problem while it is actually a non-linear one.

3. Pooling

Pooling enables the CNN to detect features in various images irrespective of the difference in lighting in the pictures and different angles of the images. There are different types of pooling such as max pooling and min pooling.

Max pooling works by placing a matrix of 2x2 on the feature map and picking the largest value in that box. The 2x2 matrix is moved from left to right through the entire feature map picking the largest value in each pass. These values then forms a new matrix called a pooled feature map.

Max pooling works by preserving the main features and at the same time reducing the size of image.  This prevents over-fitting.

4. Flattening 

Flattening involves transforming the entire pooled feature map matrix into a single column which is then fed to the neural network for processing.

5. Full Connection 

The flattened feature map is passed through a neural network. This step is made up of input layer, fully connected layer and the output layer. The fully connected layer is similar to the hidden layer in ANNs but in this case it's fully connected. The output layer is where we get the predicted classes.

The information is passed through the network and the error of prediction is calculated. The error is then back propagated through the system to improve the prediction.

The final figure produced don't usually add up to one. It is important that these figures are brought down to number between zero and one, which represents the probability of each class. This is done by the Soft-max function.

So finally if the output is 0 it will be lion and if it is 1 it will be tiger.