What is Scale Invariant Feature Transform Technique (SIFT)?

Scale Invariant Feature Transform Technique (SIFT) is a feature detection algorithm in computer vision. Lets say you visited Taj Mahal. You clicked pictures of Taj Mahal in different angles and in different backgrounds. In some angles the whole of Taj Mahal was visible while in others only a part was there. In some background you had blue sky, sun, moon, fog and so on.

We humans can easily identify that in each image Taj Mahal is common as we have seen Taj Mahal and by seeing only parts of the image we can identify it. But with machines, they will struggle to guess that the image is that of Taj Mahal.

What is Scale Invariant Feature Transform Technique (SIFT)?
Scale Invariant Feature Transform

There is a solution to this. As machines are super flexible, we can teach them to identify images at an almost human-level. This is the power of computer vision.

SIFT helps us to locate the local features in the image, commonly known as the 'key points' of the image. These key points are scale and rotation invariant that can be used for various computer vision applications like image matching, object detection, scene detection, etc.

Lets see how these key points are identified and what are the techniques used to ensure the scale and rotation in variance. The entire process has four parts.

1. Constructing a Scale Space: To make sure the features are scale dependent.

2. Key point Localization: Identifying the suitable features or key points.

3. Orient Assignment: Ensure the key point are rotation invariant.

4. Key point Descriptor:  Assign a unique fingerprint to each key point.

After carrying out all the above process in depth we will use Scale Invariant Feature Transform (SIFT) features for feature matching. Another popular feature matching algorithm is SURF (Speeded Up Robust Feature) which is simply a faster version of SIFT.