Comparative analysis of applied natural language processing technologies for improving the quality of digital document classification

A.K. Markov, D.O. Semenochkin, A.G. Kravets, T.A. Yanovskiy


Tagging digital electronic documents is the process of assigning metadata or labels (tags) to documents in order to simplify their organization, retrieval, and management. This process is essential for effective information management and document accessibility in a digital environment. The choice of tagging method and technology depends on the specific needs of the organization or user. Often a combination of different methods is used to achieve the best results in managing digital electronic documents. This paper performs a comparative analysis of natural language processing techniques to improve the quality of digital document classification, using technical educational documents as an example. The paper discusses the methods used in document preprocessing and the use of NLP, ways to improve preprocessing, and a computational experiment is conducted to determine the improvement in the completeness and accuracy of data classification.

