Кафедри
Постійне посилання на розділhttps://repository.kpi.kharkov.ua/handle/KhPI-Press/35393
Переглянути
4 результатів
Результати пошуку
Документ Applying VSM to Identify the Criminal Meaning of Texts(2020) Khairova, N. F.; Kolesnyk, Anastasiia; Mamyrbayev, Orken; Petrasova, S. V.Generally, to define the belonging of a text to a specific theme or domain, we can use approaches to text classification. However, the task becomes more complicated when there is no train corpus, in which the set of classes and the set of documents belonged to these classes are predetermined. We suggest using the semantic similarity of texts to determine their belonging to a specific domain. Our train corpus includes news articles containing criminal information. In order to define whether the theme of input documents is close to the theme of the train corpus, we propose to calculate the cosine similarity between documents of the corpus and the input document. We have empirically established the average value of the cosine similarity coefficient, in which the document can be attributed to the highly specialized documents containing criminal information.We evaluate our approach on the test corpus of articles from the news sites of Kharkiv. F-measure of the document classification with criminal information achieves 96 %.Документ Extraction of Semantic Relations from Wikipedia Text Corpus(2019) Shanidze, O.; Petrasova, S. V.This paper proposes the algorithm for automatic extraction of semantic relations using the rule-based approach. The authors suggest identifying certain verbs (predicates) between a subject and an object of expressions to obtain a sequence of semantic relations in the designed text corpus of Wikipedia articles. The synsets from WordNet are applied to extract semantic relations between concepts and their synonyms from the text corpus.Документ Method for Paraphrase Extraction from the News Text Corpus(2019) Manuilov, Illia; Petrasova, S. V.The paper discusses the process of automatic extraction of paraphrases used in rewriting. The researchers propose the method for extracting paraphrases from English news text corpora. The method is based on both the developed syntactic rules to define phrases and synsets to identify synonymous words in the designed text corpus of BBC news. In order to implement the method, Natural Language Toolkit, Universal Dependencies parser and WordNet are used.Документ Similar Text Fragments Extraction for Identifying Common Wikipedia Communities(MDPI AG, Switzerland, 2018) Petrasova, S. V.; Khairova, N. F.; Lewoniewski, Włodzimierz; Mamyrbayev, Orken; Mukhsina, KuralaySimilar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. WithWordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments inWikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities.