
The following shows how to build in Python a regression model using random forests with the Los-Angeles 2016 Crime Dataset.
Our goal is to answer the following specific questions :
- Considering night sex crimes targeting 14 years old female, compare their number depending on whereas they have occurred at home or in the street.
- Considering street robberies targeting 24 years old male, compare their number depending on whereas they have occurred in the afternoon or in the night.
- Considering night street violence acts on 29 years old individuals, compare their number depending on whereas they target a female or a male.
And more generally, to display the following three graphs :
- Number of night sex crimes in 2016 occurring at home (red curve) or in the street (green curve) according to the female victim age.
- Number of street robberies in 2016 occurring in the afternoon (red curve) or in the night (green curve) according to the male victim age.
- Number of night street violence acts in 2016 targeting a female (red curve) or a male (green curve) according to the victim age.
We start by importing the needed libraries and loading the dataset :
#importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#importing the dataset
dataset = pd.read_csv('LOS-ANGELES-2016-CRIMES-DATASET.CSV')
X = dataset.iloc[:,0:5].values
y = dataset.iloc[:,len(dataset.iloc[0])-1].values
The dataset variable contains the following array :

Los-Angeles 2016 Crime Dataset in Python
Since the first four columns of the dataset contain categorical variables, we encode the labels into numbers :
#categories encoding
from sklearn.preprocessing import LabelEncoder
labelencoder_X0 = LabelEncoder()
labelencoder_X1 = LabelEncoder()
labelencoder_X2 = LabelEncoder()
labelencoder_X3 = LabelEncoder()
X[:, 0] = labelencoder_X0.fit_transform(X[:,0])
X[:, 1] = labelencoder_X1.fit_transform(X[:,1])
X[:, 2] = labelencoder_X2.fit_transform(X[:,2])
X[:, 3] = labelencoder_X3.fit_transform(X[:,3])
Then we fit the Random Forest regression model to the dataset :
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 250, random_state=0)
regressor.fit(X, y)
THE 3 SPECIFIC QUESTIONS
We can now answer to the three specific questions asked above by estimating the number of crimes for each row. This number is a prediction since the rows given below as input to the predict() function do not exist in the dataset.
-
QUESTION 1 : The following script computes the estimated number of NIGHT SEX_CRIME on 14 years old female occurring at HOME compared to the number occurring in the STREET. x1a and x1b are the label encoded rows resp. for [NIGHT, HOME, SEX_CRIME, F, 14] and [NIGHT, STREET, SEX_CRIME, F, 14]
x1a=[labelencoder_X0.transform("NIGHT"), labelencoder_X1.transform("HOME"), labelencoder_X2.transform("SEX_CRIME"), labelencoder_X3.transform("F"), 14]
x1b=[labelencoder_X0.transform("NIGHT"), labelencoder_X1.transform("STREET"), labelencoder_X2.transform("SEX_CRIME"), labelencoder_X3.transform("F"), 14]
y1a_pred = regressor.predict(x1a)
y1b_pred = regressor.predict(x1b)
After execution : y1a_pred=37.54 is the estimated number of NIGHT SEX_CRIME on 14 years old female occurring at HOME and y1b_pred=10.684 is the one occuring in the street. -
QUESTION 2 : This second script computes the estimated number of STREET ROBBERIES on 24 years old male occurring in the AFTERNOON compared to the number occurring in the NIGHT. x2a and x2b are the label encoded rows resp. for [AFTERNOON, STREET, ROBBERY, M, 24] and [NIGHT, STREET, ROBBERY, M, 24].
x2a=[labelencoder_X0.transform("AFTERNOON"), labelencoder_X1.transform("STREET"), labelencoder_X2.transform("ROBBERY"), labelencoder_X3.transform("M"), 24]
x2b=[labelencoder_X0.transform("NIGHT"), labelencoder_X1.transform("STREET"), labelencoder_X2.transform("ROBBERY"), labelencoder_X3.transform("M"), 24]
y2a_pred = regressor.predict(x2a)
y2b_pred = regressor.predict(x2b)
After execution : y2a_pred=132.284 is the estimated number of STREET ROBBERIES on 24 years old male occurring in the AFTERNOON and y2b_pred=251.96 is the number occuring in the night. -
QUESTION 3 : This third script computes the estimated number of NIGHT STREET VIOLENCE ACT on 29 years old individuals depending on whereas the individual is a female or a male.
x3a=[labelencoder_X0.transform("NIGHT"), labelencoder_X1.transform("STREET"), labelencoder_X2.transform("VIOLENCE"), labelencoder_X3.transform("F"), 29]
x3b=[labelencoder_X0.transform("NIGHT"), labelencoder_X1.transform("STREET"), labelencoder_X2.transform("VIOLENCE"), labelencoder_X3.transform("M"), 29]
y3a_pred = regressor.predict(x3a)
y3b_pred = regressor.predict(x3b)
After execution : y3a_pred=95.56 the estimated number of NIGHT STREET VIOLENCE ACT on 29 years old females and y3b_pred=127.724 is the number for males.
THE 3 GRAPHS
-
GRAPH 1 : The following script displays the graph showing real (dots) and estimated (curve) number of NIGHT SEX CRIMES in 2016 at HOME (red) and in the STREET (green) according to the FEMALE victim age.
dataset1a = dataset[(dataset.MOMENT == "NIGHT") & (dataset.LOCATION == "HOME") & (dataset.CRIME == "SEX_CRIME") & (dataset.VICTIM_SEX == "F")]
X1a = dataset1a.iloc[:,0:5].values
y1a = dataset1a.iloc[:,len(dataset1a.iloc[0])-1].values
X1a[:, 0] = labelencoder_X0.transform(X1a[:,0])
X1a[:, 1] = labelencoder_X1.transform(X1a[:,1])
X1a[:, 2] = labelencoder_X2.transform(X1a[:,2])
X1a[:, 3] = labelencoder_X3.transform(X1a[:,3])dataset1b = dataset[(dataset.MOMENT == "NIGHT") & (dataset.LOCATION == "STREET") & (dataset.CRIME == "SEX_CRIME") & (dataset.VICTIM_SEX == "F")]
X1b = dataset1b.iloc[:,0:5].values
y1b = dataset1b.iloc[:,len(dataset1b.iloc[0])-1].values
X1b[:, 0] = labelencoder_X0.transform(X1b[:,0])
X1b[:, 1] = labelencoder_X1.transform(X1b[:,1])
X1b[:, 2] = labelencoder_X2.transform(X1b[:,2])
X1b[:, 3] = labelencoder_X3.transform(X1b[:,3])#visualising the predictions (for higher resolution and smoother curve)
minAge = min(X[:,4])
maxAge = max(X[:,4])
res=1
nb_rows=(maxAge - minAge)*res
a1a = np.zeros((nb_rows, 5))
a1a[:, 0] = np.repeat(labelencoder_X0.transform("NIGHT"), nb_rows)
a1a[:, 1] = np.repeat(labelencoder_X1.transform("HOME"), nb_rows)
a1a[:, 2] = np.repeat(labelencoder_X2.transform("SEX_CRIME"), nb_rows)
a1a[:, 3] = np.repeat(labelencoder_X3.transform("F"), nb_rows)
a1a[:, 4] = np.arange(minAge, maxAge, 1/res)
y_a1a = regressor.predict(a1a)a1b = np.zeros((nb_rows, 5))
a1b[:, 0] = np.repeat(labelencoder_X0.transform("NIGHT"), nb_rows)
a1b[:, 1] = np.repeat(labelencoder_X1.transform("STREET"), nb_rows)
a1b[:, 2] = np.repeat(labelencoder_X2.transform("SEX_CRIME"), nb_rows)
a1b[:, 3] = np.repeat(labelencoder_X3.transform("F"), nb_rows)
a1b[:, 4] = np.arange(minAge, maxAge, 1/res)
y_a1b = regressor.predict(a1b)#X_grid = np.arange(min(X), max(X), 0.001)
#X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X1a[:, 4], y1a, color='red')
plt.plot(a1a[:, 4], y_a1a, color='red')
plt.scatter(X1b[:, 4], y1b, color='green')
plt.plot(a1b[:, 4], y_a1b, color='green')
plt.title('Real (dots) and Estimated (curve) Number of NIGHT SEX CRIME in 2016 at HOME (red) and in the STREET (green) according to the FEMALE victim age.')
plt.xlabel('Female Victim Age')
plt.ylabel('Number of Night Sex Crimes');
plt.show()
The graph looks as below :
Real (dots) and Estimated (curve) Number of NIGHT SEX CRIMES in 2016 at HOME (red) and in the STREET (green) according to the FEMALE victim age.
The graph shows that night sex crimes are essentially commited at home rather than in the street. -
GRAPH 2 : The following script displays the graph showing real (dots) and estimated (curve) number of STREET ROBBERIES in 2016 occurring in the AFTERNOON (red) and in the NIGHT (green) according to the MALE victim age.
dataset2a = dataset[(dataset.MOMENT == "AFTERNOON") & (dataset.LOCATION == "STREET") & (dataset.CRIME == "ROBBERY") & (dataset.VICTIM_SEX == "M")]
X2a = dataset2a.iloc[:,0:5].values
y2a = dataset2a.iloc[:,len(dataset2a.iloc[0])-1].values
X2a[:, 0] = labelencoder_X0.transform(X2a[:,0])
X2a[:, 1] = labelencoder_X1.transform(X2a[:,1])
X2a[:, 2] = labelencoder_X2.transform(X2a[:,2])
X2a[:, 3] = labelencoder_X3.transform(X2a[:,3])dataset2b = dataset[(dataset.MOMENT == "NIGHT") & (dataset.LOCATION == "STREET") & (dataset.CRIME == "ROBBERY") & (dataset.VICTIM_SEX == "M")]
X2b = dataset2b.iloc[:,0:5].values
y2b = dataset2b.iloc[:,len(dataset2b.iloc[0])-1].values
X2b[:, 0] = labelencoder_X0.transform(X2b[:,0])
X2b[:, 1] = labelencoder_X1.transform(X2b[:,1])
X2b[:, 2] = labelencoder_X2.transform(X2b[:,2])
X2b[:, 3] = labelencoder_X3.transform(X2b[:,3])#visualising the predictions (for higher resolution and smoother curve)
minAge = min(X[:,4])
maxAge = max(X[:,4])
res=1
nb_rows=(maxAge - minAge)*res
a2a = np.zeros((nb_rows, 5))
a2a[:, 0] = np.repeat(labelencoder_X0.transform("AFTERNOON"), nb_rows)
a2a[:, 1] = np.repeat(labelencoder_X1.transform("STREET"), nb_rows)
a2a[:, 2] = np.repeat(labelencoder_X2.transform("ROBBERY"), nb_rows)
a2a[:, 3] = np.repeat(labelencoder_X3.transform("M"), nb_rows)
a2a[:, 4] = np.arange(minAge, maxAge, 1/res)
y_a2a = regressor.predict(a2a)a2b = np.zeros((nb_rows, 5))
a2b[:, 0] = np.repeat(labelencoder_X0.transform("NIGHT"), nb_rows)
a2b[:, 1] = np.repeat(labelencoder_X1.transform("STREET"), nb_rows)
a2b[:, 2] = np.repeat(labelencoder_X2.transform("ROBBERY"), nb_rows)
a2b[:, 3] = np.repeat(labelencoder_X3.transform("M"), nb_rows)
a2b[:, 4] = np.arange(minAge, maxAge, 1/res)
y_a2b = regressor.predict(a2b)#X_grid = np.arange(min(X), max(X), 0.001)
#X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X2a[:, 4], y2a, color='red')
plt.plot(a2a[:, 4], y_a2a, color='red')
plt.scatter(X2b[:, 4], y2b, color='green')
plt.plot(a2b[:, 4], y_a2b, color='green')
plt.title('Real (dots) and Estimated (curve) Number of STREET ROBBERIES in 2016 occurring in the AFTERNOON (red) and in the NIGHT (green) according to the MALE victim age.')
plt.xlabel('Male Victim Age')
plt.ylabel('Number of Street Robberies');
plt.show()
The graph looks as below :
Reall (dots) and Estimated (curve) Number of STREET ROBBERIES in 2016 occurring in the AFTERNOON (red) and in the NIGHT (green) according to the MALE victim age.
There is a huge peak of robberies around 15 years old that squeezes the rest of the graph. We can get a more detailed graph by zooming the part squeezed by the peak presence. We obtain the following graph :Zoom on the rest of the graph to avoid the peak.
The graph shows that male targeting street robberies are essentially commited the night rather than in the afternoon.
-
GRAPH 3 : The following script displays the graph showing real (dots) and estimated (curve) number of NIGHT STREET VIOLENCE ACTS in 2016 targetting a female (red) or a male (green) according to the victim age.
dataset3a = dataset[(dataset.MOMENT == "NIGHT") & (dataset.LOCATION == "STREET") & (dataset.CRIME == "VIOLENCE") & (dataset.VICTIM_SEX == "F")]
X3a = dataset3a.iloc[:,0:5].values
y3a = dataset3a.iloc[:,len(dataset3a.iloc[0])-1].values
X3a[:, 0] = labelencoder_X0.transform(X3a[:,0])
X3a[:, 1] = labelencoder_X1.transform(X3a[:,1])
X3a[:, 2] = labelencoder_X2.transform(X3a[:,2])
X3a[:, 3] = labelencoder_X3.transform(X3a[:,3])dataset3b = dataset[(dataset.MOMENT == "NIGHT") & (dataset.LOCATION == "STREET") & (dataset.CRIME == "VIOLENCE") & (dataset.VICTIM_SEX == "M")]
X3b = dataset3b.iloc[:,0:5].values
y3b = dataset3b.iloc[:,len(dataset3b.iloc[0])-1].values
X3b[:, 0] = labelencoder_X0.transform(X3b[:,0])
X3b[:, 1] = labelencoder_X1.transform(X3b[:,1])
X3b[:, 2] = labelencoder_X2.transform(X3b[:,2])
X3b[:, 3] = labelencoder_X3.transform(X3b[:,3])#visualising the predictions (for higher resolution and smoother curve)
minAge = min(X[:,4])
maxAge = max(X[:,4])
res=1
nb_rows=(maxAge - minAge)*res
a3a = np.zeros((nb_rows, 5))
a3a[:, 0] = np.repeat(labelencoder_X0.transform("NIGHT"), nb_rows)
a3a[:, 1] = np.repeat(labelencoder_X1.transform("STREET"), nb_rows)
a3a[:, 2] = np.repeat(labelencoder_X2.transform("VIOLENCE"), nb_rows)
a3a[:, 3] = np.repeat(labelencoder_X3.transform("F"), nb_rows)
a3a[:, 4] = np.arange(minAge, maxAge, 1/res)
y_a3a = regressor.predict(a3a)a3b = np.zeros((nb_rows, 5))
a3b[:, 0] = np.repeat(labelencoder_X0.transform("NIGHT"), nb_rows)
a3b[:, 1] = np.repeat(labelencoder_X1.transform("STREET"), nb_rows)
a3b[:, 2] = np.repeat(labelencoder_X2.transform("VIOLENCE"), nb_rows)
a3b[:, 3] = np.repeat(labelencoder_X3.transform("M"), nb_rows)
a3b[:, 4] = np.arange(minAge, maxAge, 1/res)
y_a3b = regressor.predict(a3b)#X_grid = np.arange(min(X), max(X), 0.001)
#X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X3a[:, 4], y3a, color='red')
plt.plot(a3a[:, 4], y_a3a, color='red')
plt.scatter(X3b[:, 4], y3b, color='green')
plt.plot(a3b[:, 4], y_a3b, color='green')
plt.title('Real (dots) and Estimated (curve) Number of NIGHT STREET VIOLENCE ACTS in 2016 targetting a female (red) or a male (green) according to the victim age.')
plt.xlabel('Victim Age')
plt.ylabel('Number of Night Street Violence Acts');
plt.show()
The graph looks as below :Real (dots) and Estimated (curve) Number of NIGHT STREET VIOLENCE ACTS in 2016 targetting a female (red) or a male (green) according to the victim age.
The graph shows that night street violence targets mostly male individuals rather than females.