Кафедра "Інтелектуальні комп'ютерні системи"

Постійне посилання колекціїhttps://repository.kpi.kharkov.ua/handle/KhPI-Press/2423

Офіційний сайт кафедри http://web.kpi.kharkov.ua/iks

Кафедра "Інтелектуальні комп’ютерні системи" заснована 12 лютого 2007 року на базі спеціальності "Прикладна лінгвістика".

У 2009 році на базі кафедри спільно з Українським мовно-інформаційним фондом НАН України було створено Науково-дослідний центр інтелектуальних систем і комп’ютерної лінгвістики.

Кафедра входить до складу Навчально-наукового інституту соціально-гуманітарних технологій Національного технічного університету "Харківський політехнічний інститут".

У складі науково-педагогічного колективу кафедри працюють: 2 доктора технічних наук, 5 кандидатів філологічних наук, 4 кандидата технічних наук, 1 кандидат філософських наук; 2 співробітника мають звання професора, 3 – доцента.

Переглянути

Результати пошуку

Зараз показуємо 1 - 3 з 3
  • Ескіз
    Документ
    The Influence of Various Text Characteristics on the Readability and Content Informativeness
    (2019) Khairova, N. F.; Kolesnyk, Anastasiia; Mamyrbayev, Orken; Mukhsina, Kuralay
    Currently, businesses increasingly use various external big data sources for extracting and integrating information into their own enterprise information systems to make correct economic decisions, to understand customer needs, and to predict risks. The necessary condition for obtaining useful knowledge from big data is analysing high-quality data and using quality textual data. In the study, we focus on the influence of readability and some particular features of the texts written for a global audience on the texts quality assessment. In order to estimate the influence of different linguistic and statistical factors on the text readability, we reviewed five different text corpora. Two of them contain texts from Wikipedia, the third one contains texts from Simple Wikipedia and two last corpora include scientific and educational texts. We show linguistic and statistical features of a text that have the greatest influence on the text quality for business corporations. Finally, we propose some directions on the way to automatic predicting the readability of texts in the Web.
  • Ескіз
    Документ
    The aligned Kazakh-Russian parallel corpus focused on the criminal theme
    (2019) Khairova, N. F.; Kolesnyk, Anastasiia; Mamyrbayev, Orken; Mukhsina, Kuralay
    Nowadays, the development of high-quality parallel aligned text corpora is one of the most relevant and advanced directions of modern linguistics. Special emphasis is placed in creating parallel multilingual corpora for low resourced languages, such as the Kazakh language. In the study, we explored texts from four Kazakh bilingual news websites and created the parallel Kazakh-Russian corpus of texts that focus on the criminal subject at their base. In order to align the corpus, we used lexical compliances set and the values of POS-tagging of both languages. 60% of our corpus sentences are automatically aligned correctly. Finally, we analyzed the factors affecting the percentage of errors.
  • Ескіз
    Документ
    Logical-linguistic model for multilingual Open Information Extraction
    (2020) Khairova, N. F.; Mamyrbayev, Orken; Mukhsina, Kuralay; Kolesnyk, Anastasiia
    Open Information Extraction (OIE) is a modern strategy to extract the triplet of facts from Web-document collections. However, most part of the current OIE approaches is based on NLP techniques such as POS tagging and dependency parsing, which tools are accessible not to all languages. In this paper, we suggest the logical-linguistic model, which basic mathematical means are logical-algebraic equations of finite predicates algebra. These equations allow expressing a semantic role of the participant of a triplet of the fact (Subject-Predicate-Object) due to the relations of grammatical characteristics of words in the sentence. We propose the model that extracts the unlimited domain-independent number of facts from sentences of different languages. The use of our model allows extracting the facts from unstructured texts without requiring a pre-specified vocabulary, by identifying relations in phrases and associated arguments in arbitrary sentences of English, Kazakh, and Russian languages. We evaluate our approach on corpora of three languages based on English and Kazakh bilingual news websites. We achieve the precision of facts extraction over 87% for English corpus, over 82% for Russian corpus and 71% for Kazakh corpus.