The aligned Kazakh-Russian parallel corpus focused on the criminal theme
dc.contributor.author | Khairova, N. F. | en |
dc.contributor.author | Kolesnyk, Anastasiia | en |
dc.contributor.author | Mamyrbayev, Orken | en |
dc.contributor.author | Mukhsina, Kuralay | en |
dc.date.accessioned | 2020-12-14T12:09:53Z | |
dc.date.available | 2020-12-14T12:09:53Z | |
dc.date.issued | 2019 | |
dc.description.abstract | Nowadays, the development of high-quality parallel aligned text corpora is one of the most relevant and advanced directions of modern linguistics. Special emphasis is placed in creating parallel multilingual corpora for low resourced languages, such as the Kazakh language. In the study, we explored texts from four Kazakh bilingual news websites and created the parallel Kazakh-Russian corpus of texts that focus on the criminal subject at their base. In order to align the corpus, we used lexical compliances set and the values of POS-tagging of both languages. 60% of our corpus sentences are automatically aligned correctly. Finally, we analyzed the factors affecting the percentage of errors. | en |
dc.identifier.citation | The aligned Kazakh-Russian parallel corpus focused on the criminal theme [Electronic resource] / N. Khairova [et al.] // Computational linguistics and intelligent systems (COLINS 2019) : proc. of the 3d Intern. Conf., April 18-19, 2019. Vol. 1: Main Conference / ed.: V. Lytvyn [et al.]. – Electron. text data. – Lviv, 2019. – P. 116-125. – URL: http://ceur-ws.org/Vol-2362/paper11.pdf, free (accessed 14.12.2020). | en |
dc.identifier.orcid | https://orcid.org/0000-0002-9826-0286 | |
dc.identifier.orcid | https://orcid.org/0000-0001-5817-0844 | |
dc.identifier.orcid | https://orcid.org/0000-0001-8318-3794 | |
dc.identifier.orcid | https://orcid.org/0000-0002-8627-1949 | |
dc.identifier.uri | https://repository.kpi.kharkov.ua/handle/KhPI-Press/49827 | |
dc.language.iso | en | |
dc.subject | criminal subject | en |
dc.subject | news websites | en |
dc.subject | POS-tagging | en |
dc.subject | Kazakh-Russian parallel corpus | en |
dc.subject | alignment | en |
dc.subject | lexical compliances | en |
dc.title | The aligned Kazakh-Russian parallel corpus focused on the criminal theme | en |
dc.type | Thesis | en |
Файли
Контейнер файлів
1 - 1 з 1
- Назва:
- Khairova_The_aligned_2019.pdf
- Розмір:
- 490.42 KB
- Формат:
- Adobe Portable Document Format
- Опис:
Ліцензійна угода
1 - 1 з 1
Ескіз недоступний
- Назва:
- license.txt
- Розмір:
- 11.25 KB
- Формат:
- Item-specific license agreed upon to submission
- Опис: