Sem Spirit

Numerical relabeling

Since the learning algorithms usually take as input numerical values, it is recommended to encode each textual label into a number.
To do so we use an encoding function and apply it on the part of the dataset containing the textual values.


#Numerical relabeling
dataset$OCCUPATION = factor(dataset$OCCUPATION, levels = c('Management', 'Manual', 'Specialty'), labels = c(0, 1, 2))
dataset$GENDER = factor(dataset$GENDER, levels = c('Female', 'Male'), labels = c(0, 1))
dataset$SALARY = factor(dataset$SALARY, levels = c('HIGH', 'LOW'), labels = c(0, 1))

After running this script the dataset looks like this :

R dataset after numerical relabeling

R dataset after numerical relabeling

The whole R script becomes :

#setting the working folder
setwd("")

#loading the dataset
dataset = read.csv('dataset.csv')

#Numerical relabeling
dataset$OCCUPATION = factor(dataset$OCCUPATION, levels = c('Management', 'Manual', 'Specialty'), labels = c(0, 1, 2))
dataset$GENDER = factor(dataset$GENDER, levels = c('Female', 'Male'), labels = c(0, 1))
dataset$SALARY = factor(dataset$SALARY, levels = c('HIGH', 'LOW'), labels = c(0, 1))

Next step : filling the blanks.