Visualizing Embeddings to Study Gender-Related Differences in Word Meaning

Tatiana Litvinova; Polina Panicheva; Elena Kotlyarova; Victoria Zavarzina

Visualizing Embeddings to Study Gender-Related Differences in Word Meaning

Tatiana Litvinova, Polina Panicheva, Elena Kotlyarova, Victoria Zavarzina

Abstract

Development of the models of distributional semantics is one of the most important directions of research in modern NLP. This field is developing rapidly. New transformed-based models allow one to obtain good results in a lot of practical tasks, although the problem of their interpretability remains largely unsolved despite research efforts made in this direction. It should also be noted that, despite the obvious progress in the field, very little attention has been given to the problem of estimating and assessing the differences in word meaning (in the sense of distributional semantics) related to the characteristics of text authors (gender, age, psychological traits, etc.). This problem has not only a theoretical but also a practical value. Currently, no attention is being paid to the characteristics of authors whose texts are used to construct pretrained models widely used in NLP, and knowing individual differences in word meaning is crucial to understanding the biases existing in these models. We use the existing methods of word embedding visualization to show the differences in the structure of word meaning related to the gender of authors and propose clustering methods to study this structure. We conclude that the development of the methods aimed at visualizing and interpreting the individual differences in word meaning is crucial both for the efficient solution of various NLP tasks and for the theory of word meaning.

Full Text:

PDF

References

DOI: 10.25559/INJOIT.2307-8162.10.202211.47-53

C. B. Ritesh, “Word Representations For Gender Classification Using Deep Learning”, Procedia Computer Science Volume, vol. 132, pp. 614-622, 2018.

E. Kotelnikov, E. Razova, and I. Fishcheva,. “A Close Look at Russian Morphological Parsers: Which One Is the Best?”, Communications in Computer and Information Science, vol. 789, pp. 131–142, 2017.

F. Heimerl, and M. Gleicher,. “Interactive analysis of word vector embeddings”, Computer Graphics Forum, vol. 37, no. 3, pp. 253–265, 2018.

G. Boleda, “Distributional Semantics and Linguistic Theory”, Annu. Rev. Linguist, vol. 6, pp. 213–34, 2020.

H. Schmid, “Probabilistic Part-of-Speech Tagging Using Decision Trees”, in Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, 1994.

J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representation.”, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014, pp. 1532–1543.

L. McInnes, and J. Healy, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction”, ArXiv; abs/1802.03426, 2018.

P. Rodriguez, and L.A. Spirling, “Word Embeddings: What Works, What Doesn’t, and How to Tell the Difference for Applied Research”, Journal of Politics, vol. 84, pp. 101-115, 2022.

PVAS (2015 - 2018) – Subcorpus of associates of military respondents (R.A.Kaftanov, A.A.АRomanenko) [Online]. Available: http://adictru.nsu.ru.

R. L. Barter, and B. Yu, “Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data’, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America, vol. 27, no. 4, pp. 910–922, 2018.

R. Heuser, “Word Vectors in the Eighteenth Century, Episode 2: Methods”, Adventures of the Virtual, 2016 [Online]. Available: http://ryan- heuser.org/word-vectors-2.

T.J. Brendan, “Distributional social semantics: Inferring word meanings from communication patterns”, Cognitive Psychology, vol. 131, 101441, 2021.

T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, and A. Kalai, “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings”, in NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 4356–4364.

T., Litvinova, “RusIdiolect: A New Resource for Authorship Studies”, in Lecture Notes in Networks and Systems, vol. 186, 2021, pp. 14-23.

T., Litvinova, A., Sboev, and P., Panicheva, “Profiling the Age of Russian Bloggers”, in Artificial Intelligence and Natural Language. AINL 2018. Communications in Computer and Information Science, vol. 930, 2018, pp. 167-177.

T. Mikolov, I. Sutskever, K,.Chen, G. Corrado, J. Dean, “Distributed representations of words and phrases and their compositionality”, in NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2, 2013, pp. 3111–3119.

X. Liu, Z. Zhang, R. Leontie, A. Stylianou, and R. Pless, “2-MAP: Aligned Visualizations for Comparison of High-Dimensional Point Sets.”, in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 2539-2547.

Refbacks

There are currently no refbacks.

Abava Кибербезопасность ИБП для ЦОД СНЭ

ISSN: 2307-8162

International Journal of Open Information Technologies