Logical-linguistic model for multilingual Open Information Extraction

dc.contributor.authorKhairova, N. F.en
dc.contributor.authorMamyrbayev, Orkenen
dc.contributor.authorMukhsina, Kuralayen
dc.contributor.authorKolesnyk, Anastasiiaen
dc.date.accessioned2020-12-11T12:25:30Z
dc.date.available2020-12-11T12:25:30Z
dc.date.issued2020
dc.description.abstractOpen Information Extraction (OIE) is a modern strategy to extract the triplet of facts from Web-document collections. However, most part of the current OIE approaches is based on NLP techniques such as POS tagging and dependency parsing, which tools are accessible not to all languages. In this paper, we suggest the logical-linguistic model, which basic mathematical means are logical-algebraic equations of finite predicates algebra. These equations allow expressing a semantic role of the participant of a triplet of the fact (Subject-Predicate-Object) due to the relations of grammatical characteristics of words in the sentence. We propose the model that extracts the unlimited domain-independent number of facts from sentences of different languages. The use of our model allows extracting the facts from unstructured texts without requiring a pre-specified vocabulary, by identifying relations in phrases and associated arguments in arbitrary sentences of English, Kazakh, and Russian languages. We evaluate our approach on corpora of three languages based on English and Kazakh bilingual news websites. We achieve the precision of facts extraction over 87% for English corpus, over 82% for Russian corpus and 71% for Kazakh corpus.en
dc.identifier.citationLogical-linguistic model for multilingual Open Information Extraction [Electronic resource] / N. Khairova [et al.] // Cogent Engineering. – Electronic text data. – 2020. – Vol. 7, Iss. 1. – 16 p. – URL: https://www.tandfonline.com/doi/pdf/10.1080/23311916.2020.1714829?needAccess=true, free (accessed 11.12.2020).en
dc.identifier.doidoi.org/10.1080/23311916.2020.1714829
dc.identifier.orcidhttps://orcid.org/0000-0002-9826-0286
dc.identifier.orcidhttps://orcid.org/0000-0001-8318-3794
dc.identifier.orcidhttps://orcid.org/0000-0002-8627-1949
dc.identifier.urihttps://repository.kpi.kharkov.ua/handle/KhPI-Press/49770
dc.language.isoen
dc.subjectOpen Information Extractionen
dc.subjectfact extraction from unstructured textsen
dc.subjectKazakh bilingual news websitesen
dc.subjectcriminal subjecten
dc.subjectlogical-linguistic modelen
dc.subjectfinite predicates algebraen
dc.titleLogical-linguistic model for multilingual Open Information Extractionen
dc.typeArticleen

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1
Ескіз
Назва:
CE_2020_7_Khairova_Logical-linguistic_model.pdf
Розмір:
973.17 KB
Формат:
Adobe Portable Document Format
Опис:

Ліцензійна угода

Зараз показуємо 1 - 1 з 1
Ескіз недоступний
Назва:
license.txt
Розмір:
11.25 KB
Формат:
Item-specific license agreed upon to submission
Опис: