Estimation of time complexity for the task of retrieval for identical products for an electronic trading platform based on the decomposition of machine learning models

Fedor Krasnov

Abstract


This paper scrutinize the solution of the problem of matching identical products for an electronic trading platform. This task is one of the components of the recommendation system of the electronic trading platform and belongs to the class of item2item  recommendation systems— product-product recommendations. As part of this task, it is important to find all identical products with different prices and delivery conditions from different suppliers from a product catalog that contains hundreds of millions of products. The mathematical formulation for this problem belongs to the class of NP-complete problems. The author applied a decomposition of machine learning models to solve this problem – faster and more versatile models select candidate pairs of identical products, and then more CPU-intensive and accurate models score the identity of candidate pairs of products. The final model determines the order in a group of identical products based on the ranking model. The result is a list of groups of identical products sorted by the rank of identity within the group. The time complexity metric for the model composition was O(N*N*LOG(P)/M), where N is the total number of products in the catalog, P is the number of product modalities, and M is the number of product categories.

Full Text:

PDF (Russian)

References


Peeters, Ralph, et al. "Using schema. org annotations for training and maintaining product matchers." Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics. 2020.

V. Gopalakrishnan, S. P. Iyengar, A. Madaan, R. Rastogi, and S. Sengamedu, “Matching Product Titles using Web-based Enrichment,” in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012, pp. 605–614.

N. Londhe, V. Gopalakrishnan, A. Zhang, H. Q. Ngo, and R. Srihari, “Matching Titles with Cross Title Web-search Enrichment and Community Detection,” Proceedings of the VLDB Endowment, vol. 7, no. 12, pp. 1167–1178, 2014.

Akritidis, Leonidas, and Panayiotis Bozanis. "Effective unsupervised matching of product titles with k-combinations and permutations." 2018 Innovations in Intelligent Systems and Applications (INISTA). IEEE, 2018.

Bhutani P, Baranwal SK, Jain S (2021) Semantic framework for facilitating product discovery. In: 2021 Advances in Computational Intelligence, its Concepts & Applications (ACI 2021), vol. CEUR-WS, pp 30–36

Arya, Sunil, et al. "An optimal algorithm for approximate nearest neighbor searching fixed dimensions." Journal of the ACM (JACM) 45.6 (1998): 891-923.

Johnson, Jeff, Matthijs Douze, and Hervé Jégou. "Billion-scale similarity search with gpus." IEEE Transactions on Big Data 7.3 (2019): 535-547.

Nan Wu, Yuan Xie, A Survey of Machine Learning for Computer Architecture and Systems, ACM Computing Surveys, 10.1145/3494523, 55, 3, (1-39), (2023).

Nikolay O. Nikitin, Pavel Vychuzhanin, Mikhail Sarafanov, Iana S. Polonskaia, Ilia Revin, Irina V. Barabanova, Gleb Maximov, Anna V. Kalyuzhnaya, Alexander Boukhanovsky, Automated evolutionary approach for the design of composite machine learning pipelines, Future Generation Computer Systems, 10.1016/j.future.2021.08.022, 127, (109-125), (2022).

S. B. Goyal, Pradeep Bedi, Anand Singh Rajawat, Rabindra Nath Shaw, Ankush Ghosh, Multi-objective Fuzzy-Swarm Optimizer for Data Partitioning, Advanced Computing and Intelligent Technologies, 10.1007/978-981-16-2164-2_25, (307-318), (2022).

Ya-Lin Zhang, Qitao Shi, Meng Li, Xinxing Yang, Longfei Li, Jun Zhou, A Classification Based Ensemble Pruning Framework with Multi-metric Consideration, Intelligent Systems and Applications, 10.1007/978-3-030-82193-7_44, (650-667), (2022).

Jianrong Yao, Zhongyi Wang, Lu Wang, Zhebin Zhang, Hui Jiang, Surong Yan, A hybrid model with novel feature selection method and enhanced voting method for credit scoring, Journal of Intelligent & Fuzzy Systems, 10.3233/JIFS-211828, 42, 3, (2565-2579), (2022).

Reimers, Nils, and Iryna Gurevych. "Sentence-bert: Sentence embeddings using siamese bert-networks." arXiv preprint arXiv:1908.10084 (2019).

Tan, Mingxing, and Quoc Le. "Efficientnet: Rethinking model scaling for convolutional neural networks." International conference on machine learning. PMLR, 2019.

Anna Veronika Dorogush, Vasily Ershov, Andrey Gulin "CatBoost: gradient boosting with categorical features support". Workshop on ML Systems at NIPS 2017


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162