Кафедри

Постійне посилання на розділhttps://repository.kpi.kharkov.ua/handle/KhPI-Press/35393

Переглянути

Результати пошуку

Зараз показуємо 1 - 1 з 1
  • Ескіз
    Документ
    Analysis of the text preprocessing methods influence on the destructive messages classifier
    (Національний технічний університет "Харківський політехнічний інститут", 2020) Orlovskyi, Oleksandr; Ostapov, Sergey
    Social networks are increasingly becoming an environment for threats, insults, profanity and other destructive manifestations of human communication. Today, a huge number of people are involvedin online platforms, and the amount of content created and reactions to it is constantly breaking records. Therefore, there is a need to automate the detection and counteraction of antisocial influences. One of the important areas of such activities is the detection of toxic comments that contain threats, insults, profanity, contempt for others and more. To perform this task, researchers usually build a classifier based on neural networks. And for their training they use a collected or publicly available set of data. The article investigates how different methods of pre-processing of input data affect the final accuracy of the classifier. Previous studies in this direction have confirmed the presence of an impact on the result, but did not allow to draw definitive conclusions about the effectiveness. Goal. Research of preliminary processing of text data methods influence on the destructive messages classifier. Results.It has been shown that the effect of a particular method can be quite dependent on the content in the data set. In addition, it is noted that sometimes the impact may be insignificant, and in some cases may even lead to a worsening of the result. It is also justified the need to pre-check the data set for the percentage of elements that fall under the impact of a particular method. Originality. The methods of data processing are evaluated on the basis of English and Russian data sets. Practical significance. The obtained results allow to make better decisions about the usage of certain pre-processing methods to improve the accuracy of the destructive messages classifier.