Особенности экстракции и идентификации знаний web-контента

Хайрова, Нина ФеликсовнаГаутам, Аджит Пратап СингхОсобенности экстракции и идентификации знаний web-контентаFeatures extraction and identification of knowledge Web-contentПолтавський національний технічний університет ім. Юрія Кондратюка2014регулярные выражениятаксономияотношение репрезентативностиWeb Content Miningidentification of knowledgeregular expressionstaxonomyattitude of representativenessMy UniversityMy University2020-12-102020-12-102014ruArticleХайрова Н. Ф. Особенности экстракции и идентификации знаний web-контента / Н. Ф. Хайрова, А. П. С. Гаутам // Системи управління, навігації та зв'язку : зб. наук. пр. / гол. редкол. С. В. Козелков. – Полтава : ПНТУ, 2014. – Вип. 4 (32). – С. 190-193.https://repository.kpi.kharkov.ua/handle/KhPI-Press/49752В статье рассматриваются особенности идентификации знаний веб-страниц. Предлагается технология Web Content Mining, включающая выделение основного контента страницы, извлечение смысловых понятий и контентный анализ, использующий регулярные выражения. Разработанные регулярные выражения позволяют выделить отношения таксономии и репрезентативности между концептами веб-страницы.In the article the features of knowledge mining and knowledge identification of web-pages have been considered. The new kind technology of Web Content Mining has been elaborated. The technology is based on the method of extraction of semantic concepts from textual information and includes the steps: exarticulation of the main page-content, extraction of the semantic concepts and the content analysis. At the stage of content analysis regular expressions have been used. The regular expressions allow to manifestly distinguish relationships of the representation and taxonomy between concepts of the webpage. As elements of regular expressions were used nouns, nouns groups and special lexical constructs.