
In order to simplify the next steps of data preprocessing, we separate the source variables (independant variables) from the target variable to be predicted (dependant variable) by adding these lines :
#splitting the dataset into the source variables (independant variables) and the target variable (dependant variable)
sourcevars = dataset[:,:-1] #all columns except the last one
targetvar = dataset[:,len(dataset[0])-1] #only the last column
The aforedmentionned ‘dataset’ array :

Initial Dataset
… is splitted into the following ‘sourcevars’ and ‘targetvar’ arrays :

Source Variables

Target Variable
The whole Python script becomes :
#importing libraries
import numpy as n
import matplotlib.pyplot as m
import pandas as p
#loading the dataset
dataset = p.read_csv('dataset.csv', sep=',').values
#dataset = p.DataFrame(dataset)
#dataset = dataset.values
#filling blank cells
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values='NaN', strategy='mean', axis = 0)
imputer = imputer.fit(dataset[:, 2:6])
dataset[:, 2:6] = imputer.transform(dataset[:, 2:6])
#dataset = p.DataFrame(dataset)
#dataset = dataset.values
#turning textual data to numerical
from sklearn.preprocessing import LabelEncoder
labelencoder_0 = LabelEncoder() #independent variable encoder
dataset[:,0] = labelencoder_0.fit_transform(dataset[:,0])
labelencoder_1 = LabelEncoder() #independent variable encoder
dataset[:,1] = labelencoder_1.fit_transform(dataset[:,1])
labelencoder_6 = LabelEncoder() #dependent (target) variable encoder
dataset[:,6] = labelencoder_6.fit_transform(dataset[:,6])
#dataset = p.DataFrame(dataset)
#dataset = dataset.values
#taking care of wrong order relationships
from sklearn.preprocessing import OneHotEncoder
onehotencoder_01 = OneHotEncoder(categorical_features = [0, 1])
dataset = onehotencoder_01.fit_transform(dataset).toarray()
#dataset = p.DataFrame(dataset)
#dataset = dataset.values
#splitting the dataset into the source variables (independant variables) and the target variable (dependant variable)
sourcevars = dataset[:,:-1] #all columns except the last one
targetvar = dataset[:,len(dataset[0])-1] #only the last column
Next step : Scaling the values in the source variables array, so that no variable dominates another.