Towards Classifying HTML-embedded Product Data Based On Machine Learning Approach
dc.contributor.author | Matveiev, Oleksandr | en |
dc.contributor.author | Zubenko, Anastasiia | en |
dc.contributor.author | Yevtushenko, Dmitry | en |
dc.contributor.author | Cherednichenko, Olga | en |
dc.date.accessioned | 2023-03-04T11:46:48Z | |
dc.date.available | 2023-03-04T11:46:48Z | |
dc.date.issued | 2021 | |
dc.description.abstract | In this paper we explored machine learning approaches using descriptions and titles to classify footwear by brand. The provided data were taken from many different online stores. In particular, we have created a pipeline that automatically classifies product brands based on the provided data. The dataset is provided in JSON format and contains more than 40,000 rows. The categorization component was implemented using K-Nearest Neighbour (K-NN) and Support Vector Machine (SVM) algorithms. The results of the pipeline construction were evaluated basing on the classification report, especially the Precision weighted average value was considered during the calculation, which reached 79.0% for SVM and 72.0% for K-NN. | en |
dc.identifier.citation | Towards Classifying HTML-embedded Product Data Based On Machine Learning Approach [Electronic resource] / O. Matveiev [et al.] // Modern Machine Learning Technologies and Data Science (MoMLeT+DS 2021) : proc. of the 3rd Intern. Workshop, June 5, 2021. Vol. 2917. – Electronic text data. – Lviv-Shatsk, 2021. – 11 p. – Access mode: https://ceur-ws.org/Vol-2917/paper8.pdf, free (date of the application 04.03.2023.). | en |
dc.identifier.orcid | https://orcid.org/0000-0001-5907-3771 | |
dc.identifier.orcid | https://orcid.org/0000-0001-9178-0847 | |
dc.identifier.orcid | https://orcid.org/0000-0001-6250-4616 | |
dc.identifier.orcid | https://orcid.org/0000-0002-9391-5220 | |
dc.identifier.uri | https://repository.kpi.kharkov.ua/handle/KhPI-Press/63001 | |
dc.language.iso | en | |
dc.subject | product classification | en |
dc.subject | SVM | en |
dc.subject | K-Nearest Neighbour | en |
dc.subject | TF-IDF | en |
dc.subject | machine learning | en |
dc.subject | vectorization | en |
dc.subject | item matching | en |
dc.title | Towards Classifying HTML-embedded Product Data Based On Machine Learning Approach | en |
dc.type | Thesis | en |
Файли
Контейнер файлів
1 - 1 з 1
Ліцензійна угода
1 - 1 з 1
Ескіз недоступний
- Назва:
- license.txt
- Розмір:
- 11.25 KB
- Формат:
- Item-specific license agreed upon to submission
- Опис: