Evaluation of Model - Simple, Multiple and Polynomial Regression


Once we have finished the modelling not only we want to check the results but also want the quantitative measure to determine how accurate the model is.


Evaluation of Model - Simple, Multiple and Polynomial Regression
Evaluation of Model

To do so we use two important measures known as R^2 or R-squared and Mean Squared Error (MSE) to determine the accuracy of the model.

R-Squared 

R-Squared also known as the coefficient of determination, is a measure to indicate how close the data is to the fitted regression line.

So what should the value of R-Squared be?

When comparing models , the model with the higher R-Squared value is a better fit to the data. 

For understanding the below terms, you can visit our previous blog.

For simple regression it is as follows: lm.score(X,Y)

For multiple regression it is as follows: lm.score(Z, df['price']

For Polynomial regression it is as follows:

from sklearn.metrics import r2_score

r_square_value=r2_score(y, p(x))

Mean Squared Error (MSE) 

The mean squared error measures the average of the squares of error, that is, the difference between the actual value and the estimated value.

First from sklearn.metrics import mean_squared_error

mse=mean_squared_error(df['price'], Ypre)

When comparing the models, the model with the smallest MSE value is a better fit for the data.

Above we saw the evaluation of our model, but should we use the whole data for training?

In sample evaluation tells us how well our model fits the data used to train it. But it does not tells us how well the trained model can be used to predict on new data.

To get rid of this possibility, we split our data  into 70% for training and 30% for testing.

We build and train the model using the training set. We use the test set to access the performance of a predictive model. Once done we use the entire data to train the model to get the best performance.  

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.3, random_state=1)

Next we move on to Cross Validation.

What is Cross Validation Score ?

It is the most common out of sample evaluation metrics. It is more effective use of data (each observation is used for both training and testing.)

from sklearn.model_selection import cross_val_score

Rcross = cross_val_score(lre, x_data, y_data, cv=3)

here lre = type of model used say linear regression.
        cv= number of partitions we want or the number of splits we want.

At the end we get the mean result and mean standard deviation for the out of sample. 

Comments