Sem Spirit

Support Vector Machine classification in R

In this usecase, we build in R the following SVM classifier (whose predictions model are shown in the 3D graph below) in order to detect if yes or no a human is present inside a room according to the room temperature, humidity and CO2 levels.

Test set observations (dots : black=occupancy and red=empty) and the model predictions (green transparent 3D shape = occupancy predictions) of the room occupancy according to temperature, CO2 and humidity levels.

Test set observations (dots : black=occupancy and red=empty) and the model predictions (green transparent 3D shape = occupancy predictions) of the room occupancy according to temperature, CO2 and humidity levels.

We first set the working directory and load the dataset :

setwd("[WORKING DIRECTORY]")
dataset = read.csv('dataset.csv')

Here follows the 40 first rows over a total of 9753 in the dataset :

Dataset in R of individual presence in a room according to levels of temperature, CO2 and humidity.

Dataset in R of individual presence in a room according to levels of temperature, CO2 and humidity.

Then we split the data into the training and the test set :

library(caTools)
set.seed(123)
split = sample.split(dataset$Occupancy, SplitRatio=0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

And we scale the values of the temperature, CO2 and humidity variables :

training_set[,1:3] = scale(training_set[,1:3])
test_set[,1:3] = scale(test_set[,1:3])

A this step we are ready to fit the classifier to the training set data, which is done with the following code :

#install.packages('e1071')
library(e1071)
classifier = svm(formula = Occupancy ~ .,
data = training_set,
type = 'C-classification',
gamma = 100
)
#summary(classifier)

Once ready, we can run the classifier on the training set and the test set in order to get the predictions.

#predicting the test set results
y_train_pred = predict(classifier, newdata=training_set[-4])
y_test_pred = predict(classifier, newdata=test_set[-4])

In order to evaluate the quality of the classifier, we compute with the following code the two confusion matrix of the predictions made with the training and test set and according to them, the success ratio of the predictions :

#Building the confusion matrix
cm_train = table(training_set[, 4], y_train_pred)
cm_test = table(test_set[, 4], y_test_pred)

cm_train_str = capture.output(show(cm_train))
writeLines(c(
"Training set confusion matrix : ",
cm_train_str,
paste("Success ratio on training set : ", toString(success_ratio(cm=cm_train)*100), "%")
))

cm_test_str = capture.output(show(cm_test))
writeLines(c(
"Test set confusion matrix : ",
cm_test_str,
paste("Success ratio on test set : ", toString(success_ratio(cm=cm_test)*100), "%")
))

The console shows the two following confusion matrix and success ratio for the training and test sets :

Confusion Matrix for training and test set predictions in R

Confusion Matrix for training and test set predictions in R

Finally, we display in a 3D graph the test set observations (dots : black=occupancy and red=empty) and the model predictions (green transparent 3D shape = occupancy predictions) of the room occupancy according to temperature, CO2 and humidity levels with the following code :

show3D(title="Test set observations (dots) and predictions (green 3D shape) of room occupancy according to temperature, CO2 and humidity levels.",
x_colname = 'Temperature', y_colname = 'CO2', z_colname = 'HumidityRatio', c_colname = 'Occupancy',
x_train = training_set[,1], y_train=training_set[,2], z_train=training_set[,3], c_train=training_set[,4],
x_test = test_set[,1], y_test = test_set[,2], z_test = test_set[,3], c_test=test_set[,4],
mesh_nb_pts = 50**3,
classifier = classifier
)

Here follows the 3D graph displayed with the test set :

Test set observations (dots : black=occupancy and red=empty) and the model predictions (green transparent 3D shape = occupancy predictions) of the room occupancy according to temperature, CO2 and humidity levels.

Test set observations (dots : black=occupancy and red=empty) and the model predictions (green transparent 3D shape = occupancy predictions) of the room occupancy according to temperature, CO2 and humidity levels.

And we can also display a 3D graph containing the training set observations with the following function call :

show3D(title="Training set observations (dots) and predictions (green 3D shape) of room occupancy according to temperature, CO2 and humidity levels.",
x_colname = 'Temperature', y_colname = 'CO2', z_colname = 'HumidityRatio', c_colname = 'Occupancy',
x_train = training_set[,1], y_train=training_set[,2], z_train=training_set[,3], c_train=training_set[,4],
x_test = training_set[,1], y_test = training_set[,2], z_test = training_set[,3], c_test=training_set[,4],
mesh_nb_pts = 50**3,
classifier = classifier
)

that leads to the following 3D graph :

Training set observations (dots : black=occupancy and red=empty) and the model predictions (green transparent 3D shape = occupancy predictions) of the room occupancy according to temperature, CO2 and humidity levels.

Training set observations (dots : black=occupancy and red=empty) and the model predictions (green transparent 3D shape = occupancy predictions) of the room occupancy according to temperature, CO2 and humidity levels.

In the both graphs, we notice that the model predictions of room occupancy (the green transparent 3D shape) fits remarquably well the occupancy observations (black dots), whereas most of the red dots (empty room observations) stay outside the green shape.

This script uses the two functions success_ratio and show3D that are defined below :

#FUNCTIONS

success_ratio <- function(cm) {
total = cm[1][1] + cm[2][1] + cm[3][1] + cm[4][1]
ratio = ((cm[1][1] + cm[4][1]) / total)
return(ratio)
}

#install.packages('plotly')
library(plotly)
show3D <- function(title, x_colname, y_colname, z_colname, c_colname, x_train, y_train, z_train, c_train, x_test, y_test, z_test, c_test, mesh_nb_pts, classifier) {
n= as.integer(mesh_nb_pts ^ (1/3)) #cubic root of mesh_nb_pts
min_x = min(x_train)
min_y = min(y_train)
min_z = min(z_train)
x_size = max(x_train) - min_x
y_size = max(y_train) - min_y
z_size = max(z_train) - min_z
x_step = x_size / n
y_step = y_size / n
z_step = z_size / n

#POSITIVE (1) PREDICTIONS AS A 3D MESH
i=0
x = array(dim = n*n*n)
y = array(dim = n*n*n)
z = array(dim = n*n*n)
for (xi in 0:n) {
for (yi in 0:n) {
for (zi in 0:n) {
x[i] = min_x + xi * x_step
y[i] = min_y + yi * y_step
z[i] = min_z + zi * z_step
i = i + 1
}
}
}
print(paste("Grid of size ", n, "x", n, "x", n, " generated (nb vertices = ", (n*n*n), ")."))
#computing the predictions on the grid
datagrid = data.frame(x, y, z)
colnames(datagrid) %
add_trace(x=as.vector(x_pts_1),y=as.vector(y_pts_1),z=z_pts_1, type = "scatter3d", mode="markers", name = "[black] Observations of class 1", marker = list(size = 3, color = 'black')) %>%
add_trace(x=as.vector(xx),y=as.vector(yy),z=zz, type = "mesh3d", name = "Predictions", alphahull=5, opacity=0.2, colors=c('#00FF00')) %>%
layout(
scene = list(
xaxis = list(title = x_colname),
yaxis = list(title = y_colname),
zaxis = list(title = z_colname)
)
)
}