A survey on natural language semantic search algorithms
Abstract
Full Text:
PDF (Russian)References
Vaswani Ashish, Shazeer Noam, Parmar Niki et al. Attention is all you need. — 2017. — URL: https://arxiv.org/abs/1706.03762.
Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina. Bert: Pre-training of deep bidirectional transformers for language understanding. — 2018. — URL: https://arxiv.org/abs/1810.04805.
Liu Yinhan, Ott Myle, Goyal Naman et al. Roberta: A robustly optimized bert pretraining approach. — 2019. — URL: https://arxiv.org/abs/1907.11692.
Language models are unsupervised multitask learners / Alec Radford, Jeff Wu, Rewon Child et al. — 2019.
Brown Tom B., Mann Benjamin, Ryder Nick et al. Language models are few-shot learners. — 2020. — URL: https://arxiv.org/abs/2005.14165.
Wang Alex, Singh Amanpreet, Michael Julian et al. Glue: A multitask benchmark and analysis platform for natural language understanding. — 2018. — URL: https://arxiv.org/abs/1804.07461.
Rajpurkar Pranav, Zhang Jian, Lopyrev Konstantin, Liang Percy. Squad: 100,000+ questions for machine comprehension of text. — 2016. — URL: https://arxiv.org/abs/1606.05250.
Lai Guokun, Xie Qizhe, Liu Hanxiao et al. Race: Large-scale reading comprehension dataset from examinations. — 2017. — URL: https://arxiv.org/abs/1704.04683.
Zellers Rowan, Holtzman Ari, Bisk Yonatan et al. Hellaswag: Can a machine really finish your sentence? — 2019. — URL: https://arxiv.org/abs/1905.07830.
Position-aware attention and supervised data improve slot filling /Yuhao Zhang, Victor Zhong, Danqi Chen et al. // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. — Copenhagen, Denmark : Association for Computational Linguistics, 2017. — . — P. 35–45. — URL: https://aclanthology.org/ D17-1004.
Yang Yinfei, Cer Daniel, Ahmad Amin et al. Multilingual universal sentence encoder for semantic retrieval. — 2019. — 1907.04307.
Embedding-based retrieval in facebook search / Jui-Ting Huang, Ashish Sharma, Shuying Sun et al. // CoRR. — 2020. — Vol. abs/2006.11632. — arXiv : 2006.11632.
Zhang Yanzhao, Long Dingkun, Xu Guangwei, Xie Pengjun. Hlatr: Enhance multi-stage text retrieval with hybrid list aware transformer reranking. — 2022. — 2205.10569.
Karpukhin Vladimir, Oğuz Barlas, Min Sewon et al. Dense passage retrieval for open-domain question answering. — 2020. — 2004.04906.
Borgeaud Sebastian, Mensch Arthur, Hoffmann Jordan et al. Improving language models by retrieving from trillions of tokens. — 2022. — 2112.04426.
Thakur Nandan, Reimers Nils, Daxenberger Johannes, Gurevych Iryna. Augmented sbert: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. — 2021. — 2010.08240.
Okapi at trec-6 automatic ad hoc, vlc, routing, filtering and qsdr /Steve Walker, Stephen E Robertson, Mohand Boughanem et al. // NIST SPECIAL PUBLICATION SP. — 1998. — P. 125–136.
Penha Gustavo, Palumbo Enrico, Aziz Maryam et al. Improving content retrievability in search with controllable query generation. — 2023. — 2303.11648.
Jagerman Rolf, Zhuang Honglei, Qin Zhen et al. Query expansion by prompting large language models. — 2023. — 2305.03653.
Zhang Yang, Bartley Travis M., Graterol-Fuenmayor Mariana et al. A chat about boring problems: Studying gpt-based text normalization. — 2024. — 2309.13426.
Bengio Yoshua, Courville Aaron, Vincent Pascal. Representation learning: A review and new perspectives. — 2014. — 1206.5538.
Gao Luyu, Callan Jamie. Unsupervised corpus aware language model pre-training for dense passage retrieval. — 2021. — 2108.05540.
Xiao Shitao, Liu Zheng, Shao Yingxia, Cao Zhao. Retromae: Pre- training retrieval-oriented language models via masked autoencoder. — 2022. — 2205.12035.
Wu Xing, Ma Guangyuan, Lin Meng et al. Contextual masked autoencoder for dense passage retrieval. — 2022. — 2208.07670.
Malkov Yu. A., Yashunin D. A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. — 2018. — 1603.09320.
Guo Ruiqi, Sun Philip, Lindgren Erik et al. Accelerating large-scale inference with anisotropic vector quantization. — 2020. — 1908.10396.
Retrieve re-rank. — https://www.sbert.net/examples/applications/ retrieve_rerank/README.html#retrieve-re-rank. — Accessed: 2022- 12- 21.
Nogueira Rodrigo, Cho Kyunghyun. Passage re-ranking with bert. — 2020. — 1901.04085.
Dai Zhuyun, Callan Jamie. Deeper text understanding for IR with contextual neural language modeling // Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. — ACM, 2019. — jul. — URL:
CEDR / Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian // Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. — ACM, 2019. — jul. — URL:
Cross-domain modeling of sentence-level evidence for document
retrieval / Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, Jimmy Lin // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP20 International Journal of Open Information Technologies ISSN: 2307- 162 vol. 12, no. 9, 2024 IJCNLP). — Hong Kong, China : Association for Computational Linguistics, 2019. — . — P. 3490–3496. — URL: https://aclanthology. org/D19-1352.
Li Canjia, Yates Andrew, MacAvaney Sean et al. Parade: Passage representation aggregation for document reranking. — 2021. — 2008.09093.
Bajaj Payal, Campos Daniel, Craswell Nick et al. Ms marco: A human generated machine reading comprehension dataset. — 2018. — 611.09268
Tay Yi, Tran Vinh Q., Dehghani Mostafa et al. Transformer memory as a differentiable search index. — 2022. — URL: https://arxiv.org/abs/2202.06991.
Raffel Colin, Shazeer Noam, Roberts Adam et al. Exploring the limits of transfer learning with a unified text-to-text transformer. — 2019. — URL: https://arxiv.org/abs/1910.10683.
Sutskever Ilya, Vinyals Oriol, Le Quoc V. Sequence to sequence learning with neural networks. — 2014. — URL: https://arxiv.org/abs/1409.3215.
Wang Yujing, Hou Yingyan, Wang Haonan et al. A neural corpus indexer for document retrieval. — 2023. — 2206.02743.
Tang Yubao, Zhang Ruqing, Guo Jiafeng et al. Listwise generative retrieval models via a sequential learning process. — 2024. — 2403.12499.
Mehta Sanket Vaibhav, Gupta Jai, Tay Yi et al. Dsi++: Updating transformer memory with new documents. — 2022. — 2212.09744.
Searching for answers in a pandemic: An overview of trec-covid /
Ellen M. Voorhees, Ian Soboroff, Kirk Roberts et al. // Journal of Biomedical Informatics. — 2021. — Vol. 121. — URL: https://doi.
org/10.1016/j.jbi.2021.103865.
Natural questions: a benchmark for question answering research /Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield et al. //Transactions of the Association for Computational Linguistics. — 2019. — Vol. 7. — P. 452–466.
TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension / Mandar Joshi, Eunsol Choi, Daniel Weld, Luke Zettlemoyer // Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). — Vancouver, Canada : Association for Computational Linguistics, 2017. — . — P. 1601–1611. — URL: https://aclanthology.org/ P17-1147.
Nentidis Anastasios, Krithara Anastasia, Paliouras Georgios, Bougiatiotis Konstantinos. Bioasq: A challenge on large-scale biomedical semantic indexing and question answering. — urlhttp://participants-area.bioasq.org/. — 2021. — Accessed: 2024-07-17.
Quora. Quora question pairs. — 2017. — Accessed: 2024-07-17. URL: https://www.kaggle.com/c/quora-question-pairs.
FEVER: a large-scale dataset for fact extraction and VERification / James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal // NAACL-HLT. — New Orleans, Louisiana : Association for Computational Linguistics, 2018. — P. 809–819. — URL: https://aclanthology.org/N18-1074.
HotpotQA: A dataset for diverse, explainable multi-hop question answering / Zhilin Yang, Peng Qi, Saizheng Zhang et al. // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing / Association for Computational Linguistics. — 2018. — P. 2369–2380. — URL: https://arxiv.org/abs/1809.09600
Www’18 open challenge: Financial opinion mining and question answering / Saulo Macedo Maia, Siegfried Handschuh, André Freitas et al. // Companion Proceedings of the The Web Conference 2018. — 2018. — URL: https://github.com/dayanfcosta/fiqa-2018-task1/blob/ master/datasets/Readme_task1.pdf.
Fact or fiction: Verifying scientific claims / David Wadden, Shanchuan Lin, Kyle Lo et al. // Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). — Online : Association for Computational Linguistics, 2020. — . — P. 7534–7550. — URL: https://aclanthology.org/2020.emnlp-main.609.
Cohan Arman, Feldman Sergey, Beltagy Iz et al. SciDocs: A Benchmark Suite for Document-Level Representation Learning. — https://allenai.org/data/scidocs. — 2020. — Version 1.0.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность IT Congress 2024
ISSN: 2307-8162