Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

dc.contributor.authorPetrasova, S. V.en
dc.contributor.authorKhairova, N. F.en
dc.contributor.authorLewoniewski, Włodzimierzen
dc.contributor.authorMamyrbayev, Orkenen
dc.contributor.authorMukhsina, Kuralayen
dc.date.accessioned2020-05-22T11:06:07Z
dc.date.available2020-05-22T11:06:07Z
dc.date.issued2018
dc.description.abstractSimilar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. WithWordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments inWikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities.en
dc.identifier.citationSimilar Text Fragments Extraction for Identifying Common Wikipedia Communities / S. Petrasova [et al.] // Data. – 2018. – Vol. 3, iss. 4. – 9 p.en
dc.identifier.orcidhttps://orcid.org/0000-0001-6011-135X
dc.identifier.urihttps://repository.kpi.kharkov.ua/handle/KhPI-Press/46382
dc.language.isoen
dc.publisherMDPI AG, Switzerlanden
dc.subjectinformation extractionen
dc.subjectshort text fragment similarityen
dc.subjectWikipedia communitiesen
dc.subjectNLPen
dc.titleSimilar Text Fragments Extraction for Identifying Common Wikipedia Communitiesen
dc.typeArticleen

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1
Ескіз
Назва:
Data_2018_3_4_Petrasova_Similar_text.pdf
Розмір:
1.74 MB
Формат:
Adobe Portable Document Format
Опис:

Ліцензійна угода

Зараз показуємо 1 - 1 з 1
Ескіз недоступний
Назва:
license.txt
Розмір:
11.25 KB
Формат:
Item-specific license agreed upon to submission
Опис: