Skip to main content

What is computer vision in ai? computer vision definition

So, what is computer vision in ai?

Computer vision in ai is the computation of visual content; that means images, videos, icons and anything else with pixels involved. This is the computer vision definition. Lets, understand it with computer vision examples and applications.

Suppose you want to find out what are the things in the image. In object classification, you train a model on the data set of specific objects and the model classifies new objects as belonging to one or more of your training categories. In object identification, your model will recognize a specific nature of an object, for example parsing of two faces in an image and tagging them. A common example can be Facebook where we tag our friends. Once we tag some one the algorithm can next time tag them automatically.

Applications which require software to understand the pixels can be termed as computer vision.

computer vision
Computer Vision

Hand writing reorganization is a great example of computer vision. Other methods include image restoration, scene reconstruction, motion analysis and image segmentation.

How does computer vision algorithms works?

Machine Learning simply interprets every image as a series of pixels. Each pixel has their own set of color values. The simplest way is to consider the image as having grey scale values. The grey scale values are converted into a simple array of numbers. Each pixel in an image can be represented by a number usually form 0 to 255. The software sees these series of numbers, it does not sees the grey scale colors.

Suppose our image has 6 columns and 3 rows which means that there are 18 input values for the given image. This is for the grey scale image.

What if the image is colored. Computers usually read color as a series of 3 values - red, green and blue (RGB) on the same scale of 0 to 255. Now, each pixel actually has 3 values for the computer to store in addition to its position. So now the input values would be 6 x 3 x 3 values or 54 numbers.

Lets check how computationally expensive it is. 

Each color value is stored in 8 bits.

8 bits x 3 colors per pixel = 24 bits per pixel

A normal sized 1024 x 768 image x 24 bits = almost 19M bits or about 2.36 megabites.

That's a lot of memory required for one image and a lot of pixels for an algorithm to iterate over. But to train the model, the more the images the more the accuracy.

Business use of computer Vision

As I had mentioned it earlier, Facebook uses Computer vision for tagging people, face recognition. Google uses it for maps; locating roads, highways office buildings, restaurants and so on. It is being used for automatic driving of cars and vehicles. In medical field it is being used to read the x-rays, MRI scan and any other type of diagnostics.

Use of Convolutional Neural Networks (CNN)

During the convolution process the input image pixels are modified by a filter. It is just a matrix but smaller than the original matrix that we multiply different pieces of the input image by. The output is called the Feature Map which will be smaller than the original image and theoretically will be more informative.


ReLU stands for Rectified Linear Unit. ReLU is an easy function to introduce non-linearity into the feature map. All negative values are simply changed to zero, removing all the black from the image.


In pooling, the image is scanned over by a set width of pixels and either the sum, max or the average of the pixels is taken as the representation of the portion of the image. This process further reduces the size of the feature map's by a factor of whatever size is pooled.

All the above operation of Convolution, ReLU and Pooling are often applied twice in a row before concluding the process of feature extraction.

The outputs of the whole process are then passed into a neural net for classification.

Some of the algorithms used in computer vision are as follows:

1. Nudity Detection detects nudity in pictures.

2. Emotion Recognition parses emotions exhibited in images.

3. SalNet automatically identifies the most important parts of an image.

4. DeepStyle transfers next-level filters onto your image.

5. Face Recognition recognizes faces.

6. Image Memorability judges how memorable an image is.

Lets see some examples where these algorithm's are used. Passing images from security cameras into Emotion Recognition. It can tell whether the people are happy or sad. It can block inappropriate pictures using Nudity Detection and so on.

In this blog article, we have covered; what is computer vision in ai? computer vision definition, computer vision examples and applications and computer vision algorithms. Do not shy to share it with your friends.


Popular posts from this blog