The following explains how to build in R a decision tree regression model with the FARS-2016-PROFILES dataset.
Here, the purpose is to get some prediction for the 4 following crash profiles that do not exist in the « FARS-2016-PROFILES » dataset :
According to 2016 data, we want an estimation of
1) the number of deaths in a road crash located in a completely dark (2) rural (1) road of Texas (48) occurring a rainy (2) friday (6) involving 2 vehicles 4 people and 1 drunk driver.
2) the number of deaths in a road crash fitting the same previous profile but without any drunk drivers
3) the number of deaths in a road crash located in a completely dark (2) rural (1) road of California (6) occurring a rainy (2) friday (6) involving 2 vehicles 5 people and 1 drunk driver.
4) the number of deaths in a road crash fitting the same previous profile but without any drunk drivers
We start by setting the working directory and loading the dataset :
#setting working directory
setwd("[WORKING DIRECTORY]")
#loading the dataset
dataset = read.csv('accident-FINAL-5-fatalities_sum-removing_duplicates-RESULT.csv')
The dataset is an array whose 39 first rows (over 20083) are as follows :

FARS 2016 « Profiles » Dataset in R : Number of deaths in road crashes according to particular conditions
With the following code we build the Decision Tree Regression Model with the dataset :
#Fitting the Decision Tree Regression model to the dataset
#install.packages('rpart')
library(rpart)
regressor = rpart(formula = FATALSUM ~., data = dataset, control = rpart.control(minsplit = 2, cp = 0.000005))
The following code answers to the first two questions for the Texas state :
y_pred_TX_Drunk = predict(regressor, data.frame(STATE = 48, DAY_WEEK = 6, LIGHT=2, WEATHER=2, ROAD_TYPE=1, VEHICLES=2, PERSONS=4, DRUNK_DRIVERS=1))
y_pred_TX_Sober = predict(regressor, data.frame(STATE = 48, DAY_WEEK = 6, LIGHT=2, WEATHER=2, ROAD_TYPE=1, VEHICLES=2, PERSONS=4, DRUNK_DRIVERS=0))
After running this script, we get y_pred_TX_Drunk=3 and y_pred_TX_Sober=1.13. Therefore, we get the prediction that
1) IF a road crash located in a completely dark (2) rural (1) road of Texas (48) occurres during a rainy (2) friday (6) involving 2 vehicles 4 people and 1 drunk driver THEN according to 2016 data, the estimated number of deaths is 3.
2) IF a road crash occures in the same conditions without any drunk drivers THEN according to 2016 data, the estimated number of deaths is 1 (rounding 1.13 to 1).
The following code answers to the next two questions in California state :
y_pred_CA_Drunk = predict(regressor, data.frame(STATE = 6, DAY_WEEK = 6, LIGHT=2, WEATHER=2, ROAD_TYPE=1, VEHICLES=2, PERSONS=5, DRUNK_DRIVERS=1))
y_pred_CA_Sober = predict(regressor, data.frame(STATE = 6, DAY_WEEK = 6, LIGHT=2, WEATHER=2, ROAD_TYPE=1, VEHICLES=2, PERSONS=5, DRUNK_DRIVERS=0))
After running this script, we get y_pred_CA_Drunk=2 and y_pred_CA_Sober=1. Therefore, we get the prediction that
1) IF a road crash located in a completely dark (2) rural (1) road of California (6) occurres during a rainy (2) friday (6) involving 2 vehicles 5 people and 1 drunk driver THEN according to 2016 data, the estimated number of deaths is 2.
2) IF a road crash occures in the same conditions this time without any drunk drivers THEN according to 2016 data, the estimated number of deaths is 1.
We can visualize the shape of the decision tree with the plot function :
#visualizing the shape of the decision tree
plot(regressor)
that creates the following PNG image :

Shape of the regression model decision tree
And we can get some technical informations on the decision tree with the summary function :
#getting informations on the decision tree regression model
summary(regressor)
that displays the following content :

Summary information provided by the decision tree regressor