Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

Ескіз

Дата

2018

DOI

item.page.thesis.degree.name

item.page.thesis.degree.level

item.page.thesis.degree.discipline

item.page.thesis.degree.department

item.page.thesis.degree.grantor

item.page.thesis.degree.advisor

item.page.thesis.degree.committeeMember

Назва журналу

Номер ISSN

Назва тому

Видавець

MDPI AG, Switzerland

Анотація

Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. WithWordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments inWikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities.

Опис

Ключові слова

information extraction, short text fragment similarity, Wikipedia communities, NLP

Бібліографічний опис

Similar Text Fragments Extraction for Identifying Common Wikipedia Communities / S. Petrasova [et al.] // Data. – 2018. – Vol. 3, iss. 4. – 9 p.

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced