Applying VSM to Identify the Criminal Meaning of Texts

Khairova, N. F.; Kolesnyk, Anastasiia; Mamyrbayev, Orken; Petrasova, S. V.

Applying VSM to Identify the Criminal Meaning of Texts

Файли

Khairova_Applying_VSM_2020.pdf (749.77 KB)

Дата

2020

Автори

ORCID

https://orcid.org/0000-0002-9826-0286
https://orcid.org/0000-0001-5817-0844
https://orcid.org/0000-0001-8318-3794
https://orcid.org/0000-0001-6011-135X

Анотація

Generally, to define the belonging of a text to a specific theme or domain, we can use approaches to text classification. However, the task becomes more complicated when there is no train corpus, in which the set of classes and the set of documents belonged to these classes are predetermined. We suggest using the semantic similarity of texts to determine their belonging to a specific domain. Our train corpus includes news articles containing criminal information. In order to define whether the theme of input documents is close to the theme of the train corpus, we propose to calculate the cosine similarity between documents of the corpus and the input document. We have empirically established the average value of the cosine similarity coefficient, in which the document can be attributed to the highly specialized documents containing criminal information.We evaluate our approach on the test corpus of articles from the news sites of Kharkiv. F-measure of the document classification with criminal information achieves 96 %.

Ключові слова

semantic similarity of texts, VSM, criminal information, news sites, cosine similarity, PPMI

Бібліографічний опис

Applying VSM to Identify the Criminal Meaning of Texts / [Electronic resource] / N. Khairova [et al.] // Computational linguistics and intelligent systems (COLINS 2020) : proc. of the 4th Intern. Conf., April 23-24, 2020. Vol. 1: Main Conference / ed.: V. Lytvyn [et al.]. – Electron. text data. – Lviv, 2020. – P. 20-31. – URL: http://ceur-ws.org/Vol-2604/paper2.pdf, free (accessed 14.12.2020).

URI

https://repository.kpi.kharkov.ua/handle/KhPI-Press/49817

Колекції

Кафедра "Інтелектуальні комп'ютерні системи"

Повна інформація про документ
Google Scholar

Applying VSM to Identify the Criminal Meaning of Texts

Файли

Дата

Автори

ORCID

DOI

Науковий ступінь

Рівень дисертації

Шифр та назва спеціальності

Рада захисту

Установа захисту

Науковий керівник/консультант

Члени комітету

Назва журналу

Номер ISSN

Назва тому

Видавець

Анотація

Опис

Ключові слова

Бібліографічний опис

URI

Колекції

Підтвердження

Рецензія

Додано до

Згадується в