Simple Linear Regression in Python

After separating the training set from the test setin Python, we obtain the following arrays.

Here follows the training set :

TRAINING SET - Car Speed VS Stop Distance

TRAINING SET – Car Speed VS Stop Distance

And here the test set :

TEST SET - Car Speed VS Stop Distance

TEST SET – Car Speed VS Stop Distance

Then in the code below, we launch the Linear Regression algorithm on the dataset :

# -*- coding: utf-8 -*-

#importing libraries
import matplotlib.pyplot as m
import pandas as p

#loading the dataset
dataset = p.read_csv("dataset.csv")
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,len(dataset.iloc[0])-1].values

#separating the training set from the test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

#fitting the linear regression model to the training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Then, with the following lines we can plot the training set points in a graph and draw the predicted model as a line :

#visualising the training set result and linear model
m.scatter(X_train, y_train, color='green')
m.plot(X_train, regressor.predict(X_train), color='red')
m.title('Stop-Distance depending on Speed : with the training set in green')
m.xlabel('Speed')
m.ylabel('Stop-Distance')
m.show()

We obtain this graph :

REGRESSION RESULT - TRAINING SET

REGRESSION RESULT – TRAINING SET

Finally, with the following lines we can plot the test set points in the graph and compare it to the ones predicted by the model :

#visualising the new predictions on the test set result
m.scatter(X_test, y_test, color='green')
m.plot(X_train, regressor.predict(X_train), color='red')
m.title('Stop-Distance depending on Speed : with the test set in green')
m.xlabel('Speed')
m.ylabel('Stop-Distance')
m.show()

Which leads to this graph :

REGRESSION RESULT - TEST SET

REGRESSION RESULT – TEST SET

We can see that the distance along the y axis between the green points and the red line is not that large for most of the cases. This shows that the model gives acceptable predictions (even if it could be better).

To get the predicted stopping distance for a car running at 62.138 mph (100 km/h), we run the code below :

regressor.predict(62.138)

which returns a value of 235.38 feets (71,74 meters).