Machine Learning Algorithms You Should Know with Python


What is Machine Learning?

Machine learning is the scientific study of algorithms and statistic model that computer system use to perform a specific task with out using explicit instructions, relying on patterns and inference instead.

Machine Learning algorithms
Machine learning algorithms

There are 3 types of machine learning Algorithms, which are as follows:

1. Supervised Learning

In supervised learning we have to predict the target variable or dependent variable from the independent variables. Using the independent variables we generate a function to map inputs to desired outputs. The training process continues until the model achieves the desired level of accuracy.

2. Unsupervised Learning

In this algorithm, we do not have any target or output variable to predict. It is used to for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention.  

3. Reinforcement Learning

Using this algorithm, the machine is trained to make specific decisions. The machine trains itself continually using trial and error. The machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions. 

Lets see some of the most common machine learning algorithms used:

1. Linear Regression

It is used to estimate real values based on continuous variables.
Example: Prediction of House price based on the given set of features.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

2. Logistic Regression

It is used to estimate discrete values based on given set on independent variables.
Example: Whether the tumor is benign or malignant.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

3. Decision Tree

It is a type of supervised learning algorithm that is mostly used for classification problems. It works for both categorical and continuous dependent variables.  

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

4. Support Vector Machine (SVM)

It is a classification method and is a supervised learning model. The algorithm outputs an optimal 
hyper plane which categorizes new examples. In two dimensional space this hyper plane is a line 
dividing a plane in two parts where in each class lay in either side.

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

5. Naive Bayes

Naive Bayes classifiers are a family of simple probabilistic classifiers based on 
applying Bayes theorem with strong independence assumptions between the features. 

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

6. K-Nearest Neighbors (KNN)

K Nearest Neighbors is a simple algorithm that stores all available cases and classifies new cases 
on a similarity measure. It is used both for classification and regression problems. 

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

7. K-Means 

It is a type of unsupervised algorithm which solves the clustering problem. The K-Mean 
algorithm identifies k number of centroids and then allocates every data point to the 
nearest cluster, while keeping the centroids as small as possible.

from sklearn.cluster import KMeans

8. Random Forest 

Random Forest is an ensemble method of classification, regression and other tasks that 
operates by constructing a multitude of decision trees at training times and outputting 
the class that is the mode of the classes or mean prediction of the individual trees. 

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

9. Dimensionality Reduction Algorithm

As we know, in real world classification problems, we have higher number of features 
and this makes it harder to work on the training set. In many cases, most of these features 
are correlated and hence redundant. This is where dimensionality reduction comes to play.

Dimensionality reduction is the process of reducing the number of random variables 
under consideration, by obtaining a set of principal variables. It can be divided into
feature selection and feature extraction. 

Some of the dimensionality reduction methods are Principal Component Analysis (PCA), 
Linear Discriminant Analysis (LDA), Generalized Discriminant Analysis (GDA).

from sklearn.decomposition import PCA

10. Gradient Boosting Algorithms 

What is Gradient Boosting Algorithm?

Gradient Boosting is a machine learning technique for regression and classification 
problems, which produces a prediction model in the form of an ensemble of weak prediction
models, typically decision tree.

The objective of any supervised learning algorithm is to define a loss function and
minimize it.

Lets see some of them:

i) Gradient Boosting Machine (GBM)

It is also known as Multiple Additive Regression Trees (MART) and Gradient Boosted
Regression Trees (GBRT). GBM is a boosting algorithm used when we deal with plenty 
of data to make a prediction with high prediction power.

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

ii) XGBoost

XGBoost is an implementation of gradient boosted decision trees designed for speed and 
performance. It is also called as regularized boosting technique.

from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

iii) Light GBM

Light GBM is a gradient boosting framework that uses tree based learning algorithm.
It has faster training speed and higher efficiency, lower memory use, better accuracy, 
parallel and GPU learning supported and Capable of handling large-scale data.  

import lightgbm as lgb
from sklearn.metrics import accuracy_score

iv) Catboost

It is a machine learning library to handle categorical data automatically. 
It yield's state-of-art-results without extensive data training required by other
machine learning methods and provides out-of-the-box support for the more descriptive 
data formats that accompany many business problems.

from catboost import CatBoostClassifier
from sklearn.metrics import accuracy_score

Hope this will have helped you to know machine learning algorithms.

Comments