Since the learning algorithms usually take as input numerical values, it is recommended to encode each textual label into a number.
To do so we use a label encoder and apply it on the part of the dataset containing the textual values.
#turning textual data to numerical
from sklearn.preprocessing import LabelEncoder
labelencoder_iv = LabelEncoder() #independent variable encoder
dataset[:,0:2] = labelencoder_iv.fit_transform(dataset[:,0:2])
labelencoder_dv = LabelEncoder() #dependent (target) variable encoder
dataset[:,6:7] = labelencoder_dv.fit_transform(dataset[:,6:7])
After running this script the dataset looks like this :
The whole Python script becomes :
#importing libraries
import numpy as n
import matplotlib.pyplot as m
import pandas as p
#importing the dataset
dataset = p.read_csv('dataset.csv', sep=',').values
#filling blank cells
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values='NaN', strategy='mean', axis = 0)
imputer = imputer.fit(dataset[:, 2:6])
dataset[:, 2:6] = imputer.transform(dataset[:, 2:6])
#turning textual data to numerical
from sklearn.preprocessing import LabelEncoder
labelencoder_0 = LabelEncoder() #independent variable encoder
dataset[:,0] = labelencoder_0.fit_transform(dataset[:,0])
labelencoder_1 = LabelEncoder() #independent variable encoder
dataset[:,1] = labelencoder_1.fit_transform(dataset[:,1])
labelencoder_6 = LabelEncoder() #dependent (target) variable encoder
dataset[:,6] = labelencoder_6.fit_transform(dataset[:,6])
Next step : correcting irrelevent orders between encoded values.