Lexical Dynamics in the Text Corpus on Computational and Corpus Linguistics: Probabilistic and Statistical Approach

Olga A. Mitrofanova

Abstract


The aim of the study is to analyze changes in the topics of scientific articles within the text corpus on Computational and Corpus Linguistics using modern topic modeling algorithms. The experiments are based on the application of a combined neural network algorithm BERTopic and the matrix-based algorithm NM, enhanced by a frequency analysis of the use of topical words in the corpus. The significance of the work lies in the development of linguistically grounded methodology for semantic search of scientific information and tracking trends in the field of computational and corpus linguistics, which can be used to improve scientometric tools. The results of the study demonstrate the dynamics of changes in scientific interests in the field of computational and corpus linguistics under the influence of the society digitalization and technological progress.


Full Text:

PDF (Russian)

References


Dialogue-21, Computational Linguistics and Intellectual Technologies. Available: https://dialogue-conf.org.

AINL Artificial Intelligence and Natural Language. Available: https://ainlconf.ru/.

AIST «International Conference on Analysis of Images, Social

Networks and Texts». Available: https://aistconf.org/.

SPECOM «International Conference on Speech and Computer». Available: https://specom.nw.ru.

FRUCT. Available: https://fruct.org/.

Dimensions. Available: https://www.dimensions.ai/.

OpenAlex. Available: https://openalex.org/.

ResearchRabbit. Available: https://www.researchrabbit.ai/.

O. A. Mitrofanova, M. A. Adamova, L. A. Bukreeva, A. K. Zernova, A. A. Litvinova, V. S. Pavlikova and P. S. Sologub, “Text Corpus on Corpus Linguistics: Composition and Stages of Formation,” In Computational Linguistics and Computational Ontologies. Vol. 8 (Proceedings of the XXVII International Joint Scientific Conference «Internet and Modern Society», IMS-2024, St. Petersburg, June 24–26, 2024), St. Petersburg, ITMO University, pp. 13-29, 2024, doi: 10.17586/2541-9781-2024-8-13-29. (In Russian)

O. A. Mitrofanova, R. V. Golubev, P. A. Gusyatskaya, K. V. Makeev, E. A. Pliusnina, D. D. Sukhan, A. V. Troshina and A. A. Utkina, “Development of Topic Models of the Corpus on Corpus Linguistics with Automatic Topic Labels Assignment,” In Computational Linguistics and Computational Ontologies. Vol. 8 (Proceedings of the XXVII International Joint Scientific Conference «Internet and Modern Society», IMS-2024, St. Petersburg, June 24–26, 2024), St. Petersburg, ITMO University, pp. 30-44, 2024, doi: 10.17586/2541-9781-2024-830-44. (In Russian)

D. D. Sukhan and E. A. Pliusnina, “Meta Tagging and Visualization for the Corpora Linguistics Texts Corpora,” In Computational Linguistics and Computational Ontologies. Vol. 8 (Proceedings of the XXVII International Joint Scientific Conference «Internet and Modern Society», IMS-2024, St. Petersburg, June 24–26, 2024), St. Petersburg, ITMO University, pp. 45-60, 2024, doi: 10.17586/2541-9781-2024-8-45-60. (In Russian)

O. A. Mitrofanova, M. A. Adamova, L. A. Bukreeva, R. V. Golubev, P. A. Gusyatskaya, A. K. Zernova, K. V. Makeev, A. A. Litvinova, V. S. Pavlikova, E. P. Plyusnina, P. S. Sologub , D. D. Sukhan, A. V. Troshina and A. A. Utkina, “Data Mining in the Text Corpus on Corpus and Computational Linguistics,” International Journal of Open Information Technologies, Vol. 12, No. 12, pp. 11-26, 2024. (In Russian)

RuTermExtract. Available: https://pypi.org/project/rutermextract/.

sumy. Available: https://github.com/miso-belica/sumy.

ruT5. Available: https://huggingface.co/ai-forever/ruT5-base.

K. V. Vorontsov, “Probabilistic topic modeling: ARTM regularization theory and BigARTM open source library,” 2025. Available: http://www.machinelearning.ru/wiki/images/d/d5/Voron17survey-artm.pdf.

BERTopic. Available: https://github.com/MaartenGr/BERTopic.

M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” arXiv preprint, 2022. Available: https://arxiv.org/abs/2203.05794.

T. Sherstinova, O. Mitrofanova, T. Skrebtsova, E. Zamiraylova and M. Kirina, “Topic Modelling with NMF vs Expert Topic Annotation: The Case Study of Russian Fiction,” In Advances in Computational Intelligence: 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Vol. 12469. pp. 134–152, 2020.

D. Kuang, J. Choo and H. Park, “Nonnegative matrix factorization for interactive topic modeling and document clustering,” Partitional clustering algorithms, pp. 215–243, 2015.

Scikit-Learn. Available: https://scikit-learn.org/.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИТ конгресс СНЭ

ISSN: 2307-8162