Efficiency estimation of methods for sentiment analysis of social network messages

Borysova, N. V.; Melnyk, K. V.

Efficiency estimation of methods for sentiment analysis of social network messages

dc.contributor.author	Borysova, N. V.	en
dc.contributor.author	Melnyk, K. V.	en
dc.date.accessioned	2019-11-15T10:50:58Z
dc.date.available	2019-11-15T10:50:58Z
dc.date.issued	2019
dc.description.abstract	The results of effectiveness evaluating of machine learning methods for sentiment analysis of social network messages are presented in this paper. The importance of the sentiment analysis problem as one of the important tasks of natural language processing in general and text ual information processing in particular is substantiated. A review of existing methods and software for sentiment analysis are ma de. The choice of classifiers for sentiment analysis of texts for this research is substantiated. The principles of functioning of a Naïve Bayesian Classifier and classifier based on a recurrent neural network are described. Classifiers were sequentially trained in two corpuses: first, in the RuTweetCorp corpus, the corpus of short messages from the social network Twitter, and then on the Slang corpus, the corpus of messages from social networks Facebook and Instagram and posts from the Pikabu website, second corpus have been marked up the tonality of slang words. Information about the tonality of slang words was taken from the youth slang dictionary obtained as a result of the survey of users. The separation of texts by tonality was carried out into three c lasses: positive, negative and neutral. The efficiency of these classifiers was evaluated. Efficiency evaluation was carried out according to standard metrics Recall, Precision, F-measure, Accuracy. For the naive Bayesian classifier, after training on the first corpus, the following metric values were obtained: Recall = 0,853; Precision = 0,869; F-measure = 0,861; Accuracy = 0,855; and after training on the second corpus such values were obtained: Recall = 0,948; Precision = 0,975; F-measure = 0,961; Accuracy = 0,960. For the classifier based on a recurrent neural network, after training on the first corpus, the following metric values were obtained: Recall = 0,870; Precision = 0,878; F-measure = 0,874; Accuracy = 0,861; and after training on the second corpus such values were obtained: Recall = 0,965; Precision = 0,982; F-measure = 0,973; Accuracy = 0,973. These results prove that additional training on the second corpus increased the efficiency of classifiers by 10–11%.	en
dc.description.abstract	У роботі представлено результати оцінки ефективності методів машинного навчання для сентимент - аналізу повідомлень соціальних мереж. Обґрунтовано актуальність задачі сентимент-аналізу як однієї з важливих задач обробки природної мови взагалі та обробки текстової інформації зокрема. Проведено огляд існуючих методів сентимент-аналізу та програмних продуктів, що вирішують цю задачу. Обґрунтовано вибір класифікаторів для сентимент-аналізу текстів у межах дослідження. Описано принципи роботи наївного байєсівського класифікатора та класифікатора на основі рекурентної нейронної мережі. Класифікатори було послідовно навчено на двох корпусах: спочатку на корпусі RuTweetCorp – корпусі коротких повідомлень соціальної мережі Twitter, а потім на корпусі Slang corpus – корпусі повідомлень соціальних мереж Facebook та Instagram і постів з сайту Pikabu, у якому розмічено тональність сленгових слів. Інформацію про тональність сленгових слів було взято із словника молодіжного сленгу, отриманого у результаті опитування користувачів. Розподіл текстів за тональністю здійснювався на три класи: позитивні, негативні й нейтральні. Проведено оцінку ефективності роботи цих класифікаторів. Оцінка ефективності здійснювалась за стандартними метриками Recall, Precision, F-measure, Accuracy. Для наївного байєсівського класифікатора після навчання на першому корпусі були отримані наступні значення метрик: Recall = 0,853; Precision = 0,869; F-measure = 0,861; Accuracy = 0,855; а після навчання на другому корпусі такі значення: Recall = 0,948; Precision = 0,975; F-measure = 0,961; Accuracy = 0,960. Для класифікатора на основі рекурентної нейронної мережі після навчання на першому корпусі були отримані наступні значення метрик: Recall = 0,870; Precision = 0,878; F-measure = 0,874; Accuracy = 0,861; а після навчання на другому корпусі такі значення: Recall = 0,965; Precision = 0,982; F-measure = 0,973; Accuracy = 0,973 Отримані результати довели, що додаткове навчання на другому корпусі підвищило ефективність роботи класифікаторів на 10–11%.	uk
dc.identifier.citation	Borysova N.V. Efficiency estimation of methods for sentiment analysis of social network messages / N. V. Borysova, K. V. Melnyk // Вісник Національного технічного університету "ХПІ". Сер. : Системний аналіз, управління та інформаційні технології = Bulletin of the National Technical University "KhPI". Ser. : System analysis, control and information technology : зб. наук. пр. – Харків : НТУ "ХПІ", 2019. – № 2. – С. 76-81.	en
dc.identifier.doi	doi.org/10.20998/2079-0023.2019.02.13
dc.identifier.orcid	https://orcid.org/0000-0002-8834-2536
dc.identifier.orcid	https://orcid.org/0000-0001-9642-5414
dc.identifier.uri	https://repository.kpi.kharkov.ua/handle/KhPI-Press/42803
dc.language.iso	en
dc.publisher	Національний технічний університет "Харківський політехнічний інститут"	uk
dc.subject	machine learning	en
dc.subject	text classification	en
dc.subject	naïve Bayesian classification	en
dc.subject	recurrent neural network	en
dc.subject	машинне навчання	uk
dc.subject	класифікація текстів	uk
dc.subject	наївний байєсівський класифікатор	uk
dc.subject	рекурентна нейронна мережа	uk
dc.title	Efficiency estimation of methods for sentiment analysis of social network messages	en
dc.title.alternative	Оцінка ефективності методів сентимент-аналізу повідомлень соціальних мереж	uk
dc.type	Article	en

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1

Назва:: vestnik_KhPI_2019_2_SAUI_Borysova_Еfficiency.pdf
Розмір:: 804.29 KB
Формат:: Adobe Portable Document Format
Опис:

Завантажити

Ліцензійна угода

Зараз показуємо 1 - 1 з 1

Назва:: license.txt
Розмір:: 11.21 KB
Формат:: Item-specific license agreed upon to submission
Опис:

Завантажити

Зібрання

Вісник № 02. Системний аналіз, управління та інформаційні технології
Кафедра "Програмна інженерія та інтелектуальні технології управління ім. А. В. Дабагяна"
Кафедра "Інтелектуальні комп'ютерні системи"