Method for determining the semantic similarity of arbitrary length texts using the Transformers models

Olizarenko, Serhii; Radchenko, Viacheslav

doi:https://doi.org/10.20998/2522-9052.2021.2.18

Method for determining the semantic similarity of arbitrary length texts using the Transformers models

dc.contributor.author	Olizarenko, Serhii	en
dc.contributor.author	Radchenko, Viacheslav	en
dc.date.accessioned	2021-08-02T06:41:48Z
dc.date.available	2021-08-02T06:41:48Z
dc.date.issued	2021
dc.description.abstract	The paper considers the results of a method development for determining the semantic similarity of arbitrary length texts based on their vector representations. These vector representations are obtained via multilingual Transformers model usage, and direct problem of determining semantic similarity of arbitrary length texts is considered as the text sequence pairs classification problem using Transformers model. Comparative analysis of the most optimal Transformers model for solving such class of problems was performed. Considered in this case main stages of the method are: Transformers model fine-tuning stage in the framework of pretrained model second problem (sentence prediction), also selection and implementation stage of the summarizing method for text sequence more than 512 (1024) tokens long to solve the problem of determining the semantic similarity for arbitrary length texts.	en
dc.description.abstract	В роботі розглянуті результати розробки методу визначення семантичної подібності текстів довільної довжини на основі їх векторних уявлень. При цьому векторні уявлення отримані з використанням мультимовної моделі Transformers, а безпосередньо завдання визначення семантичного подібності текстів довільної довжини розглядається як задача класифікації пар текстових послідовностей з використанням моделі Transformers. Виконано порівняльний аналіз найбільш оптимальної моделі Transformers для вирішення даного класу задач. Основними етапами методу при цьому розглядаються етап тонкої настройка моделі Transformers в рамках другого завдання преднавченої моделі (завдання прогнозування пропозицій), а також етап вибору і реалізації методу суммарізаціі текстової послідовності довжиною понад 512 (1024) токенів для вирішення завдання визначення семантичного подібності текстів довільної довжини.	uk
dc.identifier.citation	Olizarenko S. Method for determining the semantic similarity of arbitrary length texts using the Transformers models / Serhii Olizarenko, Viacheslav Radchenko // Сучасні інформаційні системи = Advanced Information Systems. – 2021. – Т. 5, № 2. – С. 126-130.	en
dc.identifier.doi	https://doi.org/10.20998/2522-9052.2021.2.18
dc.identifier.orcid	https://orcid.org/0000-0002-7762-6541
dc.identifier.orcid	https://orcid.org/0000-0002-2505-1969
dc.identifier.uri	https://repository.kpi.kharkov.ua/handle/KhPI-Press/53778
dc.language.iso	en
dc.publisher	Національний технічний університет "Харківський політехнічний інститут"	uk
dc.subject	vector representation	en
dc.subject	fine-tuning	en
dc.subject	векторне подання	uk
dc.subject	тонке налагодження	uk
dc.title	Method for determining the semantic similarity of arbitrary length texts using the Transformers models	en
dc.title.alternative	Метод визначення семантичної подібності текстів довільної довжини з використанням моделей Transformers	uk
dc.type	Article	en

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1

Назва:: AIS_2021_5_2_Olizarenko_Method.pdf
Розмір:: 1.03 MB
Формат:: Adobe Portable Document Format
Опис:

Завантажити

Ліцензійна угода

Зараз показуємо 1 - 1 з 1

Назва:: license.txt
Розмір:: 11.25 KB
Формат:: Item-specific license agreed upon to submission
Опис:

Завантажити

Колекції

Кафедра "Комп'ютерна інженерія та програмування"