Product Search Recall: Comparison of Embedding Encoders
Abstract
The rapid development of neural network architectures based on the transformer model leaves little time for in-depth evaluation of their efficacy in product search applications. New research into transformers is published frequently, but there may not always be sufficient time and resources available to test their performance against existing solutions, such as those based on FastText, DSSM, or ELMO, for example. Comparisons in academic publications are typically made using public datasets that can differ significantly from those employed in industrial settings. Transitioning to new machine learning models within industrial search systems involves significant time and resource investment due to the necessity of indexing product catalogs, resulting in a high cost of error. In this study, we analyze the strengths and limitations of various models in relation to product search and conduct an experimental evaluation of different models for coders. A training dataset was generated based on user search logs. Models with optimal hyperparameters were trained using this dataset. Detailed definitions of autonomous metrics for measuring efficiency were provided. A validation dataset was collected for textual representations of products. Comparisons of autonomous metrics were made for the task of product search in e-commerce, and the metrics were evaluated by experts. Pre-trained models and models trained from scratch were both considered for coders. When comparing models, the values of the recall R@k were calculated for each model for a set of search queries. It was found that the effectiveness of encoders varied depending on the number of words in the search query. This study does not seek to develop the best possible model for solving the common challenges associated with obtaining textual representations. Instead, the authors acknowledge the limitations of their work, but given the constraints of available resources, they wish to establish a reliable methodology for selecting the most appropriate machine learning model for addressing this specific task. Therefore, the primary contribution of this research is the methodology employed for comparing various machine learning approaches within the context of the encoder component in the application of enhancing the comprehensiveness of product searches.
Full Text:
PDF (Russian)References
Muennighoff N. et al. MTEB: Massive text embedding benchmark //arXiv preprint arXiv:2210.07316. – 2022.
Snegirev A. et al. The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design //arXiv preprint arXiv:2408.12503. – 2024.
Ellen Voorhees. 2004. Overview of the TREC 2004 Robust Retrieval Track. In TREC.
Nguyen, Tri, et al. "MS MARCO: A human generated machine reading comprehension dataset." choice 2640 (2016): 660.
Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, and Bodo Billerbeck. 2020. ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search. arXiv preprint arXiv:2006.05324 (2020).
Reddy C. K. et al. Shopping queries dataset: A large-scale ESCI benchmark for improving product search //arXiv preprint arXiv:2206.06588. – 2022.
Chen Y. et al. Wands: Dataset for product search relevance assessment //European Conference on Information Retrieval. – Cham : Springer International Publishing, 2022. – S. 128-141.
Choi, J.I., Kallumadi, S., Mitra, B., Agichtein, E., Javed, F.: Semantic product search for matching structured product catalogs in e-commerce. In: https://arxiv.org/pdf/2008.08180.pdf (2020)
Crowdflower search results relevance. https://www.kaggle.com/c/crowdflower-search-relevance/overview
Wand An., Normann Ph., Baumeister S., Wilm T., Reade W., Demkin M. OTTO Recommender Systems Dataset: A real-world e-commerce dataset for session-based recommender systems research (2022).
Carlos García Ling, Elizabeth HMGroup, Frida Rim, inversion, Jaime Ferrando, Maggie, neuraloverflow, xlsrln // H&M Personalized Fashion Recommendations. Kaggle. https://kaggle.com/competitions/h-and-m-personalized-fashion-recommendations, 2022.
Krasnov F. V. Upravlenie raznoobraziem tovarov v rekomendatel'nyh modeljah na osnove arhitektury s mehanizmom vnimanija (transformerah) //International Journal of Open Information Technologies. – 2024. – T. 12. – #. 1. – S. 68-75.
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models. arXiv preprint arXiv:2104.08663 (4 2021). https://arxiv.org/abs/2104.08663
Voorhees E. M. et al. (ed.). TREC: Experiment and evaluation in information retrieval. – Cambridge : MIT press, 2005. – T. 63.
Wang L. et al. Multilingual e5 text embeddings: A technical report //arXiv preprint arXiv:2402.05672. – 2024.
Krasnov F.V. Porogovye pokazateli polnoty i tochnosti dlja ocenki sistemy izvlechenija informacii o tovarah na osnove jembeddingov // Biznes-informatika. 2024. T. 18. # 2. S. 22–34. DOI: 10.17323/2587-814X.2024.2.22.34
Devlin, Jacob et al.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. // North American Chapter of the Association for Computational Linguistics – 2019.
Liu, Yinhan et al. : RoBERTa: A Robustly Optimized BERT Pretraining Approach.// ArXiv abs/1907.11692 – 2019.
Lan Z.: ALBERT. A lite BERT for self-supervised learning of language representations //arXiv preprint arXiv:1909.11942. – 2019.
He, Pengcheng, et al. "Deberta: Decoding-enhanced bert with disentangled attention." arXiv preprint arXiv:2006.03654 (2020).
Zmitrovich, Dmitry, et al. "A family of pretrained transformer language models for Russian." arXiv preprint arXiv:2309.10931 (2023).
Bojanowski P. et al. Enriching word vectors with subword information //Transactions of the association for computational linguistics. – 2017. – T. 5. – S. 135-146.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность IT Congress 2024
ISSN: 2307-8162