Logistic Regression

Logistic regression is a binomial regression model. As with all binomial regression models, this involves modeling a simple mathematical model that best fits many real observations. In other words, it is a matter of associating with a vector of random variables X = (x_1, …, x_K) to a binomial random variable generically noted y. Logistic regression is a special case of a generalized linear model. It is used in machine learning and a wide range of other domains. One can quote in a non-exhaustive way the following applications :

  • In medicine, it allows for example to find the factors that characterize a group of sick subjects compared to healthy subjects.
  • In the field of insurance, it makes it possible to target a fraction of the customers who will be sensitive to an insurance policy on this or that particular risk.
  • In the banking field, to detect risk groups when subscribing a loan.
  • In econometrics, to explain a discrete variable. For example, voting intentions in elections.

THE DATASET : LOS ANGELES 2016 SEX CRIMES

The below dataset has been built with the « Los Angeles Crime Data From 2010 to Present » accessible here. It is a simplified version that provides the number of SEX CRIMES that occurred in 2016 according to the VICTIM GENDER and AGE. The figure below shows the first 40 rows of the dataset :

Los Angeles 2016 Sex Crimes Dataset

Los Angeles 2016 Sex Crimes Dataset

The description of the columns and their values are :

  • VICTIM_GENDER : The gender of the victim = { M, F }
  • VICTIM_AGE : The age of the victim
  • SEX_CRIMES : The number of sexual assault targeting people of this age and gender for the whole 2016 year.

This dataset is used to build a classifier using the logistic regression method in Python and in R.

Here follows the full dataset in CSV format :

VICTIM_AGE,SEX_CRIMES,VICTIM_GENDER
27,145,F
39,83,F
45,60,F
65,3,M
16,320,F
17,226,F
22,147,F
50,41,F
13,213,F
18,161,F
21,144,F
28,127,F
29,119,F
32,113,F
33.73427719,351,F
34,90,F
36,83,F
38,90,F
41,70,F
42,57,F
43,61,F
46,56,F
52,30,F
55,31,F
58,24,F
59,28,F
60,24,F
63,15,F
71,2,F
88,2,F
46,14,M
10,59,F
11,136,F
12,166,F
14,210,F
15,326,F
19,153,F
20,155,F
23,141,F
24,155,F
25,163,F
26,130,F
30,105,F
31,107,F
33,98,F
35,86,F
37,63,F
40,60,F
44,56,F
47,57,F
48,47,F
49,42,F
51,48,F
53,41,F
54,31,F
56,35,F
57,34,F
61,11,F
62,10,F
64,12,F
65,11,F
66,7,F
68,7,F
69,6,F
70,9,F
74,2,F
75,5,F
77,3,F
78,6,F
80,6,F
83,3,F
95,1,F
10,17,M
11,20,M
12,28,M
13,28,M
14,25,M
15,40,M
16,42,M
17,51,M
18,12,M
20,13,M
21,24,M
22,22,M
23,19,M
24,16,M
25,18,M
26,18,M
27,23,M
28,33,M
29,13,M
30,22,M
31,19,M
32,19,M
33,25,M
33.73427719,156,M
34,18,M
35,21,M
36,20,M
37,19,M
38,19,M
39,18,M
40,15,M
41,18,M
42,14,M
43,18,M
45,19,M
47,14,M
48,16,M
49,17,M
50,13,M
52,10,M
54,9,M
55,11,M
56,8,M
57,4,M
58,7,M
59,15,M
61,7,M
63,8,M
64,6,M
66,8,M
68,2,M
70,4,M
74,2,M
77,4,M
83,2,M
76,6,F
60,5,M
44,8,M
53,9,M
84,2,F
67,4,M
81,2,M
19,13,M
51,12,M
67,10,F
82,1,M
95,1,M
72,4,F
73,3,F
85,2,F
89,2,F
90,3,F
92,1,F
99,1,F
62,3,M
69,4,M
71,3,M
78,2,M
84,1,M
88,1,M
91,1,F
87,4,F
81,1,F
82,1,F
96,1,F
72,1,M
73,2,M
85,1,M