Alignment of Job and Resume Vector Representations with LLM

Lyubov Komarova, Alexey Kolosov, Vladimir Soloviev

Abstract


This study presents modern natural language processing (NLP) approaches for analyzing the alignment between job descriptions and resumes. The research focuses on using Large Language Models (LLMs) to create vector representations of job descriptions, resumes, and the skills extracted from them. It demonstrates that vector representations of job descriptions, resumes, and their extracted skills occupy distinct vector spaces, while standardized ESCO skills and words exist in a unified vector space. To test the hypotheses regarding the unity of vector spaces were utilized, statistical method - Maximum Mean Discrepancy (MMD) and dimensionality reduction algorithms (t-SNE and Ivis), to visualize of vector distribution and analyze. The study also provides an in-depth analysis of experimental results, with special attention to the properties of ESCO skills, which form a cohesive vector space due to their standardization. The findings of this research can improve recruitment processes by offering innovative methods for matching candidate skills with employer requirements. Additionally, the study highlights the importance of data standardization in facilitating accurate interpretation and alignment.


Full Text:

PDF (Russian)

References


J. Malinowski, T. Keim, and O. Wendt, "Matching People and Jobs: A Bilateral Recommendation Approach," 2006.

R. Kessler, F. Béchet, M. Roche, and J.-M. Torres-Moreno, "Automatic extraction of skills from resumes: The ESCO experience," in Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, 2013.

P. Nikolaev, V. Rangarajan Sridhar, and P. B. Bogen, "BERT for Matching Resumes and Job Descriptions."

T. Brown et al., "Language Models are Few-Shot Learners," 2020.

H. Aguinis, J. R. Beltran, A. Cope, "How to use generative AI as a human resource management assistant”, Organizational Dynamics, Volume 53, Issue 1, 2024, doi: 10.1016/j.orgdyn.2024.101029

S. Panda, C. Shen, R. Perry, J. Zorn, A. Lutz, C. E. Priebe, and J. T. Vogelstein, "Nonpar MANOVA via Independence Testing," arXiv preprint arXiv:1910.08883 [cs, stat], April 2021.

C. Shen and J. T. Vogelstein, "The exact equivalence of distance and kernel methods in hypothesis testing," AStA Advances in Statistical Analysis, September 2020. doi:10.1007/s10182-020-00378-1.

A. C. Belkina et al., "Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets," Nature Communications, vol. 10, no. 1, pp. 1–

, 2019.

L. J. P. van der Maaten and G. E. Hinton, "Visualizing High-Dimensional Data Using t-SNE," Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, "A Kernel Two-Sample Test," Journal of Machine Learning Research, vol. 13, pp. 723–773, 2012.

L. Van-Duyet, V. Minh-Quan, and D. Quang-An, "Skill2vec: Machine Learning Approaches for Determining the Relevant Skill from Job Description," 2019.

S. Pudasaini, "Scoring of Resume and Job Description Using Word2vec and Matching Them Using Gale–Shapley Algorithm," 2021.

C. M. Jaramillo, "Word embedding for job market spatial representation: tracking changes and predicting skills demand," 2020.

W. Jitkrittum, Z. Szabo, K. Chwialkowski, and A. Gretton, "Interpretable Distribution Features with Maximum Testing Power," 2016.

L. Bulej, "IVIS: Highly customizable framework for visualization and processing of IoT data," 2020. DOI: 10.1109/SEAA51224.2020.00095.

H. S. Lee and C. Wallraven, "Visualizing the embedding space to explain the effect of knowledge distillation," arXiv preprint arXiv:2110.04483, 2021. https://doi.org/10.48550/arXiv.2110.04483.

J. Pennington, R. Socher, and C. Manning, "GloVe: Global Vectors for Word Representation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. DOI: 10.3115/v1/D14-1162.

D. S. Asudani, N. K. Nagwani, and P. Singh, "Impact of word embedding models on text analytics in deep learning environment: a review," Artificial Intelligence Review, vol. 56, pp. 10345–10425, 2023. https://doi.org/10.1007/s10462-023-10419-1.

S. J. Johnson, M. R. Murty, and I. Navakanth, "A detailed review on word embedding techniques with emphasis on word2vec," Multimedia Tools and Applications, vol. 83, pp. 37979–38007, 2024. https://doi.org/10.1007/s11042-023-17007-z.

L. Shamir, "Automatic identification of rank correlation between image sequences," International Journal of Data Science and Analytics, vol. 17, pp. 1–11, 2024. https://doi.org/10.1007/s41060-023-00450-4.

R. Chattamvelli, "Rank Correlation," in Correlation in Engineering and the Applied Sciences, Synthesis Lectures on Mathematics & Statistics. Springer, Cham, 2024. https://doi.org/10.1007/978-3-031-51015-1_3.

G. Briganti, "How ChatGPT works: a mini review," European Archives of Oto-Rhino-Laryngology, vol. 281, pp. 1565–1569, 2024. https://doi.org/10.1007/s00405-023-08337-7.

F. D. Souza and J. B. de O. e S. Filho, "Embedding generation for text classification of Brazilian Portuguese user reviews: from bag-of-words to transformers," Neural Computing and Applications, vol. 35, pp. 9393–9406, 2023. https://doi.org/10.1007/s00521-022-08068-6.

S. Ott, K. Hebenstreit, V. Liévin et al., "ThoughtSource: A central hub for large language model reasoning data," Scientific Data, vol. 10, p. 528, 2023. https://doi.org/10.1038/s41597-023-02433-3.

L. Wang, C. Ma, X. Feng et al., "A survey on large language model based autonomous agents," Frontiers of Computer Science, vol. 18, p. 186345, 2024. https://doi.org/10.1007/s11704-024-40231-1


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162