Toxic Spotter

The Toxic Spotter is a tool to evaluate the toxicity level of a comment. Given a sequence of characters, the Toxic Spotter returns for each of the following categories « toxic », « noxious », « obscene », « insult », « threat » and « identity hate », the probability that the text belongs to it.

The Toxic Spotter is available as a twitter bot here.


You can send a message and he will send you back a reply with the toxicity level of your tweet as shown below.

Toxic Spotter -- A tweet considered as non toxic

Toxic Spotter — A tweet considered as non toxic

Toxic Spotter -- A tweet considered toxic (threat)

Toxic Spotter — A tweet considered toxic (threat)

For any inquiries about the bot : see the contact page.

The Toxic Spotter classifier has been trained with a dataset of more than 150 thousands of sentences tagged with each of the categories. Each sentence has been spell corrected before the training process. The evaluation of the classifier shows an accuracy of 98.16% and an average AUR (average Area Under ROC curve for all the categories) of 98.03% :

Predictions for category [ toxic ]...
- computing BINARY predictions for category [ toxic ]...
- classifier [ toxic ] mean score = 0.95854561862
- computing PROBA predictions for target category [ toxic ]...
- classifier [ toxic ] roc_auc score = 0.972763588933
Predictions for category [ noxious ]...
- computing BINARY predictions for category [ noxious ]...
- classifier [ noxious ] mean score = 0.990581113346
- computing PROBA predictions for target category [ noxious ]...
- classifier [ noxious ] roc_auc score = 0.986382508326
Predictions for category [ obscene ]...
- computing BINARY predictions for category [ obscene ]...
- classifier [ obscene ] mean score = 0.978793404899
- computing PROBA predictions for target category [ obscene ]...
- classifier [ obscene ] roc_auc score = 0.984205514125
Predictions for category [ insulting ]...
- computing BINARY predictions for category [ insulting ]...
- classifier [ insulting ] mean score = 0.972664548514
- computing PROBA predictions for target category [ insulting ]...
- classifier [ insulting ] roc_auc score = 0.977199660478
Predictions for category [ threatening ]...
- computing BINARY predictions for category [ threatening ]...
- classifier [ threatening ] mean score = 0.997593577861
- computing PROBA predictions for target category [ threatening ]...
- classifier [ threatening ] roc_auc score = 0.986660156861
Predictions for category [ identity hate ]...
- computing BINARY predictions for category [ identity hate ]...
- classifier [ identity hate ] mean score = 0.991709123724
- computing PROBA predictions for target category [ identity hate ]...
- classifier [ identity hate ] roc_auc score = 0.97485623447
Average Accuracy = 0.981647897827
Average AUR = 0.980344610532

Toxic Spotter -- ROC Curves for the 6 categories

Toxic Spotter — ROC Curves for the 6 categories

Toxic Spotter -- Top Left Corner ZOOM of the ROC Curves for the 6 categories

Toxic Spotter — Top Left Corner ZOOM of the ROC Curves for the 6 categories