Applying VSM to Identify the Criminal Meaning of Texts

dc.contributor.authorKhairova, N. F.en
dc.contributor.authorKolesnyk, Anastasiiaen
dc.contributor.authorMamyrbayev, Orkenen
dc.contributor.authorPetrasova, S. V.en
dc.date.accessioned2020-12-14T10:03:16Z
dc.date.available2020-12-14T10:03:16Z
dc.date.issued2020
dc.description.abstractGenerally, to define the belonging of a text to a specific theme or domain, we can use approaches to text classification. However, the task becomes more complicated when there is no train corpus, in which the set of classes and the set of documents belonged to these classes are predetermined. We suggest using the semantic similarity of texts to determine their belonging to a specific domain. Our train corpus includes news articles containing criminal information. In order to define whether the theme of input documents is close to the theme of the train corpus, we propose to calculate the cosine similarity between documents of the corpus and the input document. We have empirically established the average value of the cosine similarity coefficient, in which the document can be attributed to the highly specialized documents containing criminal information.We evaluate our approach on the test corpus of articles from the news sites of Kharkiv. F-measure of the document classification with criminal information achieves 96 %.en
dc.identifier.citationApplying VSM to Identify the Criminal Meaning of Texts / [Electronic resource] / N. Khairova [et al.] // Computational linguistics and intelligent systems (COLINS 2020) : proc. of the 4th Intern. Conf., April 23-24, 2020. Vol. 1: Main Conference / ed.: V. Lytvyn [et al.]. – Electron. text data. – Lviv, 2020. – P. 20-31. – URL: http://ceur-ws.org/Vol-2604/paper2.pdf, free (accessed 14.12.2020).en
dc.identifier.orcidhttps://orcid.org/0000-0002-9826-0286
dc.identifier.orcidhttps://orcid.org/0000-0001-5817-0844
dc.identifier.orcidhttps://orcid.org/0000-0001-8318-3794
dc.identifier.orcidhttps://orcid.org/0000-0001-6011-135X
dc.identifier.urihttps://repository.kpi.kharkov.ua/handle/KhPI-Press/49817
dc.language.isoen
dc.subjectsemantic similarity of textsen
dc.subjectVSMen
dc.subjectcriminal informationen
dc.subjectnews sitesen
dc.subjectcosine similarityen
dc.subjectPPMIen
dc.titleApplying VSM to Identify the Criminal Meaning of Textsen
dc.typeThesisen

Файли

Контейнер файлів
Зараз показуємо 1 - 1 з 1
Вантажиться...
Ескіз
Назва:
Khairova_Applying_VSM_2020.pdf
Розмір:
749.77 KB
Формат:
Adobe Portable Document Format
Опис:
Ліцензійна угода
Зараз показуємо 1 - 1 з 1
Ескіз недоступний
Назва:
license.txt
Розмір:
11.25 KB
Формат:
Item-specific license agreed upon to submission
Опис: