The fight against Internet hate speech – technical context of CMP system

Recently on our blog, we described the Content Moderation Platform project, which we worked as a technology partner in cooperation with Wirtualna Polska. We created an NLP-based automated system to help moderating comments appearing on our partner’s website. Since this project has generated a lot of interest, we will present to you more technical details about it. If you are interested in the following: tools used for the architecture of the model, preparing a dataset for training, or others — continue reading.

Konrad Krawczyk – Machine Learning Specialist in WEBSENSA

This article was written in cooperation with Konrad Krawczyk – our Machine Learning Specialist in the CMP project.

The main purpose of CMP system

The model we have prepared is based on artificial intelligence, and it is used for in-depth real-time hate-speech detection. It detects not only offensive words but also is able to read the context of the statement and, based on it, classify the comment accordingly.

It is part of a more extensive integrated moderation system, which also includes other models, for example, robot detection in Internet traffic on a site.

System CMP walczący z mową nienawiści w sieci — Nasz model klasyfikuje komentarze do jednej z trzech grup

Our model classifies comments into one of three groups:

Accepted — the comment does not violate the company’s rules;
Rejected — the comment violates the rules;
Uncertain — the system is not sure how to classify a comment, so it is sent for additional verification by a human working as a moderator.

CMP system is used for in-depth real-time hate-speech detection

You are probably wondering when the comments get classified as “uncertain” and are sent for additional human moderation. Well, we have set a certainty threshold which determines how confident the model is where to classify the comment. Its decision is based on a probability assessment, so if the certainty level is lower than the threshold, then the comment is sent for additional human analysis.

Dataset used for training

System CMP walczący z mową nienawiści w sieci — Zbiór danych wykorzystany do trenowania modelu

As you can guess, to make our model work properly, we needed a huge dataset for learning and analysis. For this purpose, we used:

about 17.5 million comments from WP to train the model;
about 235 thousand labelled comments prepared by people working as moderators to train the classifier.

The set of comments we received was not suitable for learning the model because it contained errors such as a lot of unnecessary attributes, or special characters that only added a disturbance. So, to teach our model the proper classification of comments, we had to prepare, clean and classify the data carefully. First of all, we decided to replace those unnecessary elements such as links or numbers with special tokens that reduce any disturbances and inform us about such an attribute.

What’s more, we decided to focus on the emoticons added in the comment. Through their appropriate replacement, we extracted key information about the emotions they express. Thanks to that, our model can analyse the content of the comments also based on the added emoticons.

The architecture of the model

System CMP walczący z mową nienawiści w sieci — Architektura modelu CMP

As a model architecture, we chose ULMFiT (Universal Language Model Fine-tuning for Text Classification) — a model consisting of neural networks LSTM (long short-term memory). Its key feature is the ability to easily adapt to a specific task based on the pre-prepared language model. It became a perfect tool to fine-tune a pre-prepared language model (which could predict words based on the previous ones) to a downstream task, which — in this case — was comments classification.

For the tokenisation process, we used a text tokeniser called SentencePiece. What is essential in our project, this tool does not convert words into specific tokens but divides them into smaller ones, which in some tasks significantly improves models’ performance. SentencePiece works excellent for this purpose, but we have to remember that to build a good-quality corpus, containing well-represented tokens, it needs a large amount of data — ours was created from a whole set of 17.5 million comments.

Language model preparing

System CMP walczący z mową nienawiści w sieci — Przygotowanie modelu języka

Our model was quite large due to both: architecture and data, which required the use of advanced equipment. That is why we used the Google Cloud Platform with GPU NVIDIA Tesla P100 on the n1-highmem-16 machine (16 vCPUs, 104 GB memory) to train the model.

The training of the language model took about ten days of intensive learning. For this purpose, we used only the comments we received from Wirtualna Polska (i.e. 17.5 million).

CMP system takes into account borrowings, abbreviations, internet slang and possible commenting errors

So, why did we decide to focus only on this source? After a thorough analysis of the available literature on the Polish language, we found this solution to be the best. If we would use a fine-tuning language model based on, e.g. Wikipedia, the effectiveness of our model could be lower. The content on Wikipedia is very different from the language of the people’s written comments on the Internet. In our project, we had to take into account possible borrowings, abbreviations, internet slang and even possible commenting errors.

Fine-tuning language model into the classifier

System CMP walczący z mową nienawiści w sieci — Dostosowanie modelu języka do klasyfikatora

After training the language model, we were able to pull out the most important things from it — the encoder and the previously prepared language corpus (using SentencePiece) and then “apply” a classification layer on it (thanks to ULMFiT model).

Automatic database update

Training the classification was much shorter and needed fewer resources, but still was taking place using GPU. It allowed us to automate the process to regularly update and retrain our classifier based on the new data, which makes our tool continuously improve and works increasingly better with time.

Separating the model into two groups

Wirtualna Polska is a company which includes several portals. Some of them have an informational character, and others cover gossip. As it turned out, the language used in the comments in these two groups was also different. Therefore, we had to divide our task into two parts and prepare separate models for each of these groups.

Balancing out the chance of two classification options

It is worth mentioning that the training set we received was quite different — about 74% of the comments were classified as “accepted” and only 26% as “rejected”. This difference meant that we had to balance them to make sure that the chance of classifying a comment in one of these places was equal. By loss compensation function, we have eliminated this problem, and the neural network was able to train properly (undisturbed) on such a training set.

The results

System CMP walczący z mową nienawiści w sieci — Wyniki osiągnięte przez model CMP

The results of our model have met the expectations of both our team and Wirtualna Polska. On the test set, it reached an accuracy of 93%. Moreover, when we set the certainty threshold, it was able to improve the result up to 99%, automating publishing on about half of the comments publishing on supported websites. The rest of them, i.e. those about which the model is uncertain, go to “human moderators”, who decide where to classify the comment.

What is interesting, we decided to verify its effectiveness, and tested it on the PolEval 2019 contest dataset, because the task also concerned hate-speech detection. As a result, we achieved an F1-score of 65.96, while the winning team reached 58.58.

Other possibilities to use our model

System CMP walczący z mową nienawiści w sieci — Inne możliwości wykorzystania systemu CMP

The content moderation system created by us works perfectly well on Wirtualna Polska website, but — with a proper fine-tuning — it can be used more widely and cover varied fields, among others:

social media platforms,
portals, where users give their ratings to particular products or services,
websites for children,
blogging platforms,
and many more.

Our model was built so that it can be customised to any tool or platform as well as the policy of the company that wants to implement it.

Summary

Hate speech can have very tragic consequences, so it is worth using tools such as CMP to avoid or at least minimise them. Would you like to use a similar tool in your company? Contact our experts. We are happy to talk about your needs.