Використання статистичної моделі когерентності зв'язного тексту в якості додаткового інструменту кількісного контент аналізу

Шевченко, І. В.; Андреєв, П. І.; Дернова, М. Г.; Хайрова, Ніна Феліксівна

doi:https://doi.org/10.30929/1995 0519.2021.5. 62-67

Використання статистичної моделі когерентності зв'язного тексту в якості додаткового інструменту кількісного контент аналізу

dc.contributor.author	Шевченко, І. В.
dc.contributor.author	Андреєв, П. І.
dc.contributor.author	Дернова, М. Г.
dc.contributor.author	Хайрова, Ніна Феліксівна
dc.date.accessioned	2026-03-30T08:58:13Z
dc.date.issued	2021
dc.description.abstract	Метою роботи є дослідження можливостей використання когерентності частотних характеристик абзаців для виявлення ключових слів та слів-сателітів, що оточують ключові слова, тобто контекстних множин. Для досягнення поставленої мети вирішені такі завдання: розроблено модель подання тексту, що відрізняється від наявних тим, що включає множину слів, які найбільш часто зустрічаються, множину ключових слів, множину слів-сателітів, перетин множин абзаців, ключових слів, та слів-сателітів, що дозволяє отримати формальну основу для побудови методу аналізу динаміки відносних частот слів, які найбільш часто зустрічаються у тексті та вияв-лення ключових слів і контекстних множин; розроблено метод аналізу тексту, якій відрізняється від існуючих тим, що в його основі лежить виявлення позитивних кореляцій між відносними частотами входження підмно-жини найбільш частих слів в абзацах, що дозволяє виявити ключові слова та контекстні підмножині у текстах, які мають властивість зв’язності та в окремих абзацах тексту, якій має слабку зв’язність. Розроблений метод можна використовувати як допоміжний інструмент контент-аналізу зв’язних текстів.
dc.description.abstract	We consider the language system as a set of subsystems, structured in the form of a semiotic hierarchy, in which the content of higher-level units is not completely reduced to the substantive components of lower-level units. Therefore, the meaning of highereve units cannot a ways e «ca cu ate » taking into account infor ation a out the meaning of lower-level units and information about the relationships between these units. At the same time, the structural model of the language system uses thematic or semantic features of connectivity between units of one level of the hierarchy. This opens up certain possibilities for quantitative content analysis. Methodology. Considering the results of known works, we noticed that none of them uses the analysis of paragraphs as independent structural units of the text. The paragraph usually reveals one micro-theme of the text, which is in the development of the theme of the whole text. It is hypothesized that there should be certain patterns in the gradual dynamics of the frequencies of certain words from one paragraph to another, if the studied text has the property of coherence, when a certain topic plays the role of leitmotif. The aim of this work is to study the possibility of using the coherence of the frequency characteristics of paragraphs to identify keywords and satellite words surrounding the keywords – context sets. Results. To achieve this goal the following tasks are solved: development of a text model that takes into account the task of paragraph-by-paragraph analysis of the dynamics of relative frequencies; development of a method of paragraph-by-paragraph text analysis; testing of the developed method on a collection of documents. Originality. A text representation model has been developed that differs from the existing ones in that it includes a set of the most common words, a set of keywords, a set of satellite words, the intersection of sets of paragraphs, keywords, and satellite words. This provides a formal basis for building a method of analyzing the dynamics of relative frequencies of words that are most common in the text and identifying keywords and context sets. A method of text analysis has been developed, which differs from the existing ones in that it is based on the detection of positive correlations between the relative frequencies of occurrence of a subset of the most frequent words in paragraphs. This allows you to identify keywords and context subsets in texts that have some coherence and in individual paragraphs of text that have weak coherence. Practical value. A set of Ukrainian-language, Russian-language and English-language scientific and technical texts was formed to test the efficiency of the text analysis method. The set includes scientific and technical articles on various topics and fragments of textbooks. The results of machine analysis for keyword detection were compared with the author's sets of keywords in scientific and technical articles. Experts were involved to determine the keyword sets of the textbook fragments. Comparison of author's and expert sets of keywords with sets that were formed by the proposed method showed its efficiency. The match ranged from 50 % to 90 %, taking into account the fact that in the author's sets there were phrases, and in the machine sets the elements of these phrases were shown separately. The developed method can be used as an auxiliary tool for content analysis of related texts.
dc.identifier.citation	Шевченко І. В., Андреєв П. І., Дернова М. Г., Хайрова Н. Ф. Використання статистичної моделі когерентності зв'язного тексту в якості додаткового інструменту кількісного контент аналізу. Вісник Кременчуцького національного університету імені Михайла Остроградського. 2021. № 5 (130). С. 62–67. https://doi.org/10.30929/1995 0519.2021.5. 62-67.
dc.identifier.doi	https://doi.org/10.30929/1995 0519.2021.5. 62-67
dc.identifier.orcid	https://orcid.org/0000-0003-3009-8611
dc.identifier.orcid	https://orcid.org/0000-0003-4368-9584
dc.identifier.orcid	https://orcid.org/0000-0003-4545-5247
dc.identifier.orcid	https://orcid.org/0000-0002-9826-0286
dc.identifier.uri	https://repository.kpi.kharkov.ua/handle/KhPI-Press/100433
dc.language.iso	uk
dc.publisher	Кременчуцький національний університет імені Михайла Остроградського
dc.subject	контент-аналіз
dc.subject	модель тексту
dc.subject	когерентність
dc.subject	абзац
dc.subject	відносні частоти
dc.subject	ключові слова
dc.subject	контекстна множина
dc.subject	content analysis
dc.subject	text model
dc.subject	coherence
dc.subject	paragraphs
dc.subject	relative frequencies
dc.subject	keywords
dc.subject	context set
dc.title	Використання статистичної моделі когерентності зв'язного тексту в якості додаткового інструменту кількісного контент аналізу
dc.title.alternative	Use of the statistical model of coherence of connected text as an additional tool of quantitative content analysis
dc.type	Article

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1

Назва:: visnyk_KrNU_2021_5_Shevchenko_Vykorystannia.pdf
Розмір:: 384.5 KB
Формат:: Adobe Portable Document Format

Завантажити

Ліцензійна угода

Зараз показуємо 1 - 1 з 1

Назва:: license.txt
Розмір:: 11.15 KB
Формат:: Item-specific license agreed upon to submission
Опис:

Завантажити

Колекції

Кафедра "Інтелектуальні комп'ютерні системи"