Intrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data

dc.contributor.authorSemenov, Serhii
dc.contributor.authorKrupska-Klimczak, Magdalena
dc.contributor.authorCzapla, Roman
dc.contributor.authorKrzaczek, Beata
dc.contributor.authorGavrylenko, Svitlana
dc.contributor.authorPoltorazkiy, Vadim
dc.contributor.authorZozulia, Vladislav
dc.date.accessioned2025-11-10T09:38:37Z
dc.date.issued2025
dc.description.abstractThis paper examines traditional machine learning algorithms, neural networks, and the benefits of utilizing ensemble models. Data preprocessing methods for improving the quality of classification models are considered. To balance the classes, Undersampling, Oversampling, and their combination (Over + Undersampling) algorithms are explored. A procedure for reducing feature correlation is proposed. Classification models based on meta-algorithms such as SVM, KNN Naive Bayes, Perceptron, Bagging, Random Forest, AdaBoost, and Gradient Boosting have been thoroughly investigated. The settings of the base classifiers and meta-algorithm parameters have been optimized. The best result was obtained by using an ensemble classifier based on the Random Forest algorithm. Thus, an intrusion detection method based on the preprocessing of highly correlated and imbalanced data has been proposed. The scientific novelty of the obtained results lies in the integrated use of the developed procedure for reducing feature correlation, the application of the SMOTEENN data balancing method, the selection of an appropriate classifier, and the fine tuning of its parameters. The integration of these procedures and methods resulted in a higher F1 score, reduced training time, and faster recognition speed for the model. This allows us to recommend this method for practical use to improve the quality of network intrusion detection.
dc.identifier.citationIntrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data [Electronic resource] / Serhii Semenov [et al.] // Applied sciences. – Electronic text data. – 2025. – Vol. 15. – P. 1-15. – Acess mode: https://www.mdpi.com/2076-3417/15/8/4243, free (date of the application 10.11.2025.)
dc.identifier.doihttps://doi.org/10.3390/app15084243
dc.identifier.orcidhttps://orcid.org/0000-0003-4472-9234
dc.identifier.orcidhttps://orcid.org/0000-0003-3558-0300
dc.identifier.orcidhttps://orcid.org/0000-0002-5093-0420
dc.identifier.orcidhttps://orcid.org/0009-0003-5312-4939
dc.identifier.urihttps://repository.kpi.kharkov.ua/handle/KhPI-Press/95005
dc.language.isoen
dc.subjectcomputer systems
dc.subjectnetwork
dc.subjectmachine learning
dc.subjectdata preprocessing
dc.subjectSMOTEENN
dc.subjectensemble classifier
dc.subjectrandomforest
dc.subjectgradient boosting
dc.titleIntrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data
dc.typeArticle

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1
Вантажиться...
Ескіз
Назва:
AS_2025_15_Semenov_Intrusion.pdf
Розмір:
3.06 MB
Формат:
Adobe Portable Document Format

Ліцензійна угода

Зараз показуємо 1 - 1 з 1
Вантажиться...
Ескіз
Назва:
license.txt
Розмір:
11.25 KB
Формат:
Item-specific license agreed upon to submission
Опис: