Collection and processing of a Medical Corpus in Ukrainian

dc.contributor.authorCherednichenko, Olga
dc.contributor.authorKanishcheva, Olga
dc.contributor.authorYakovleva, Olena
dc.contributor.authorArkatov, Denis
dc.date.accessioned2024-02-02T16:58:36Z
dc.date.available2024-02-02T16:58:36Z
dc.date.issued2020
dc.description.abstractThe text corpora are the basis of natural language studying. We describe the structure of a Ukrainian-language corpus (UKRMED), which contains a variety of medical text genres (Сlinical protocols, Blogs, and Wikipedia). The paper shows the process of collecting, creating and processing a corpus of medical data in Ukrainian. We represent our own framework for creating a text corpus. The medical domain and text simplification are chosen as corpus directions. The authors gave statistical characteristics of the corpus, an analysis of the morphological parts of speech is provided. Frequency lemmas for this medical corps are analyzed. The UKRMED corpus can be used for solving the task of natural language simplification.
dc.identifier.citationCollection and processing of a Medical Corpus in Ukrainian [Electronic resource] / O. Cherednichenko [et al.] // Computational Linguistics and Intelligent Systems (COLINS 2020) : proc. of the 4th Intern. Conf., April 23-23, 2020. Vol. 2604. – Electronic text data. – Lviv, 2020. – 11 p. – Access mode: https://ceur-ws.org/Vol-2604/paper21.pdf, free (date of the application 02.02.2024.).
dc.identifier.orcidhttps://orcid.org/0000-0002-9391-5220
dc.identifier.orcidhttps://orcid.org/0000-0002-9035-1765
dc.identifier.orcidhttps://orcid.org/0000-0002-6129-6146
dc.identifier.urihttps://repository.kpi.kharkov.ua/handle/KhPI-Press/73613
dc.language.isoen
dc.subjectMedicine Corpus
dc.subjectCorpus Linguistic
dc.subjectUkrainian
dc.subjectText Collection
dc.subjectUkrainian-language corpus
dc.subjectNatural Language Processing
dc.titleCollection and processing of a Medical Corpus in Ukrainian
dc.typeArticle

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1
Ескіз
Назва:
Cherednichenko_Collection_and_processing_2020.pdf
Розмір:
621.92 KB
Формат:
Adobe Portable Document Format

Ліцензійна угода

Зараз показуємо 1 - 1 з 1
Ескіз недоступний
Назва:
license.txt
Розмір:
11.25 KB
Формат:
Item-specific license agreed upon to submission
Опис: