## What is Machine Learning?

Machine learning is the scientific study of algorithms and statistic model that computer system use to perform a specific task with out using explicit instructions, relying on patterns and inference instead. Machine learning algorithms

## 1. Supervised Learning

In supervised learning we have to predict the target variable or dependent variable from the independent variables. Using the independent variables we generate a function to map inputs to desired outputs. The training process continues until the model achieves the desired level of accuracy.

## 2. Unsupervised Learning

In this algorithm, we do not have any target or output variable to predict. It is used to for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention.

## 3. Reinforcement Learning

Using this algorithm, the machine is trained to make specific decisions. The machine trains itself continually using trial and error. The machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions.

## 1. Linear Regression

It is used to estimate real values based on continuous variables.
Example: Prediction of House price based on the given set of features.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

## 2. Logistic Regression

It is used to estimate discrete values based on given set on independent variables.
Example: Whether the tumor is benign or malignant.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

## 3. Decision Tree

It is a type of supervised learning algorithm that is mostly used for classification problems. It works for both categorical and continuous dependent variables.

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

## 4. Support Vector Machine (SVM)

It is a classification method and is a supervised learning model. The algorithm outputs an optimal
hyper plane which categorizes new examples. In two dimensional space this hyper plane is a line
dividing a plane in two parts where in each class lay in either side.

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

## 5. Naive Bayes

Naive Bayes classifiers are a family of simple probabilistic classifiers based on
applying Bayes theorem with strong independence assumptions between the features.

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

## 6. K-Nearest Neighbors (KNN)

K Nearest Neighbors is a simple algorithm that stores all available cases and classifies new cases
on a similarity measure. It is used both for classification and regression problems.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

## 7. K-Means

It is a type of unsupervised algorithm which solves the clustering problem. The K-Mean
algorithm identifies k number of centroids and then allocates every data point to the
nearest cluster, while keeping the centroids as small as possible.

from sklearn.cluster import KMeans

## 8. Random Forest

Random Forest is an ensemble method of classification, regression and other tasks that
operates by constructing a multitude of decision trees at training times and outputting
the class that is the mode of the classes or mean prediction of the individual trees.

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

## 9. Dimensionality Reduction Algorithm

As we know, in real world classification problems, we have higher number of features
and this makes it harder to work on the training set. In many cases, most of these features
are correlated and hence redundant. This is where dimensionality reduction comes to play.

Dimensionality reduction is the process of reducing the number of random variables
under consideration, by obtaining a set of principal variables. It can be divided into
feature selection and feature extraction.

Some of the dimensionality reduction methods are Principal Component Analysis (PCA),
Linear Discriminant Analysis (LDA), Generalized Discriminant Analysis (GDA).

from sklearn.decomposition import PCA

Gradient Boosting is a machine learning technique for regression and classification
problems, which produces a prediction model in the form of an ensemble of weak prediction
models, typically decision tree.

The objective of any supervised learning algorithm is to define a loss function and
minimize it.

## i) Gradient Boosting Machine (GBM)

It is also known as Multiple Additive Regression Trees (MART) and Gradient Boosted
Regression Trees (GBRT). GBM is a boosting algorithm used when we deal with plenty
of data to make a prediction with high prediction power.

from sklearn.metrics import accuracy_score

## ii) XGBoost

XGBoost is an implementation of gradient boosted decision trees designed for speed and
performance. It is also called as regularized boosting technique.

from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

## iii) Light GBM

Light GBM is a gradient boosting framework that uses tree based learning algorithm.
It has faster training speed and higher efficiency, lower memory use, better accuracy,
parallel and GPU learning supported and Capable of handling large-scale data.

import lightgbm as lgb
from sklearn.metrics import accuracy_score

## iv) Catboost

It is a machine learning library to handle categorical data automatically.
It yield's state-of-art-results without extensive data training required by other
machine learning methods and provides out-of-the-box support for the more descriptive
data formats that accompany many business problems.

from catboost import CatBoostClassifier
from sklearn.metrics import accuracy_score

Hope this will have helped you to know machine learning algorithms.