Defining thematic relevance of messages in the task of online social networks monitoring in providing information-psychological security

Yulia Davydova

Abstract


Online social networks (OSN) are actively used for implementation of illegal and destructive activities including propaganda of drugs, suicide, terrorism. There is the task of detecting such materials for providing information-psychological security of a person. For dealing with it automated monitoring of OSN is required. Monitoring includes search in unstructured users’ text messages using keywords characterizing the object of monitoring. There is a problem of thematic relevance during process of OSN monitoring as well as in information retrieval systems. It happens because of language phenomenon of lexical ambiguity. The task of illegal content detecting is complicated by using of slang and jargon in communications, it does not allow to use existing effective approaches to word sense disambiguation. For fixing the problem of topical relevance author suggests to use topic models based on contexts of keywords. Additional multidimensional scaling for contexts in semantic space and subsequent clustering allow to make sense induction of posts from OSN. Condition for classifying a message as illegal content is proposed. Developed technique was tested on  The National  Corpus of Russian and The General Internet-Corpus of Russian.

Full Text:

PDF (Russian)

References


Davydova Yu. V. Algorithm of fuzzy text search in online social networks // International Journal of Open Information Technologies. – 2018. – Vol. 6, No. 5. – pp. 21-27.

Savva Yu.B., Eremenko V.T., Davydova Yu.V. O probleme lingvisticheskogo analiza slenga v zadache avtomatizirovannogo poiska urgoz rasprostraneniya narkomanii v virtual’nyh social’nyh setyah // Informatcyonnye sistemy i tehnologii. – 2015. – T. 6, №. 92, – S. 68-75.

Nacional'nyj korpus russkogo yazyka // URL: http://www.ruscorpora.ru/

General’nyj internet-korpus russkogo yazyka // URL: http://www.webcorpora.ru/

Navigli R. A quick tour of word sense disambiguation, inducation and related approaches // SOFSEM 2012: Theory and practice of computer science – proceedings of the 38th International Conference on Current Trends in Theory and Practice in Computer Science, 2012. – pp. 115-129.

Lukashevich N. V. Tezaurusy v zadachah informacionnogo poiska. – M.: Izdatel'stvo Moskovskogo universiteta, 2011. – 512 s.

EHlektronnyj tezaurus dlya anglijskogo yazyka Wordnet // URL: https://wordnet.princeton.edu/

Nikolaeva I. S., Mitrenina O. V., Lando T. M. Prikladnaya i komp'yuternaya lingvistika. – M.: LENAND, 2016. – 320 s.

Azarova I., Mitrofanova O., Sinopalnikova A., Yavorskaya M., Oparin I. RussNet: Building a Lexical Database for the Russian Language // Proceedings of the Workshop on Wordnet Structures and Standardisation and How this affect Wordnet Applications and Evaluation, 2002. – pp. 60-64.

Braslavski P., Ustalov D., Mukhin M., Kiselev Y. YARN: Spining-in-Progress // Proceedings of the Eight Global Wordnet Conference, 2016. – pp.58-65.

Navigli R. Word sense disambiguation: a survey // ACM Computing Surveys– 2009. – Vol. 41, No. 2. – Article 10.

Iomdin B.L., Lopuhina A. A., Piperski A. CH. i dr. Slovar' bytovoj terminologii: novye problemy i novye metody // Komp'yuternaya lingvistika i intellektual'nye tekhnologii: materialy ezhegodnoj Mezhdunarodnoj konferencii «Dialog», 2012. – S. 213-227.

Manning C. D., Schutze H. Foundations of statistical language processing. – The MIT Press, 2000. – 680 p.

Wang T., Hirst G. Applying a naïve bayes similarity measure to word sense disambiguation // Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014. – pp. 531-537.

Agirre E., Edmonds P. Word sense disambiguation: algorithms and applications, Springer, 2007. – 364 p.

Jurafsky D., Martin J. H. Speech and language processing. – Pearson Prentice Hall, 2009. – 988 p.

Popov A. Neural networks models for word sense disambiguation: an overview // Cybernetics and information technologies. – 2018. – Vol. 18, No. 1. – pp. 139-151

Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed representations of words and phrases and their compositionality // Proceedings of the 26th of International Conference on Neural Information processing system (NIPS’13), 2013. – pp. 3111-3119.

Pennington J., Socher R., Manning Ch. GloVe: global vectors for word representation // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. – pp. 1532-1543

Lopukhin K. A., Lopukhina A. A. Word sense disambiguation for Russian verbs using semantic vectors and dictionary entries // Komp'yuternaya lingvistika i intellektual'nye tekhnologii: materialy ezhegodnoj Mezhdunarodnoj konferencii «Dialog», 2016. – S. 393-405.

Uslu T., Mehler A., Baumartz D., Hemati W. fast-Sense: an efficient word sense disambiguation classifier // Proceedings of the 11th International Conference on Language Resources and Evaluation, 2018. – pp. 1042-1046.

Arefyev N., Ermolaev P., Panchenko A. How much does a word weight? Weighting word embedding for word sense induction // Komp'yuternaya lingvistika i intellektual'nye tekhnologii: materialy ezhegodnoj Mezhdunarodnoj konferencii «Dialog», 2018. – S. 68-84.

Voroncov K. V. Veroyatnostnoe tematicheskoe modelirovanie // URL: http://www.machinelearning.ru/wiki/images/2/22/Voron-2013-ptm.pdf

Korshunov A., Gomzin A. Tematicheskoe modelirovanie tekstov na estestvennom yazyke // Trudy instituta sistemnogo programmirovaniya RAN. – 2012. – T. 23. – S. 215-244.

Lopukhin K. A., Iomdin B. L., Lopukhina A. A. Word sense induction for Russian: deep study and comparison with dictionaries // Komp'yuternaya lingvistika i intellektual'nye tekhnologii: materialy ezhegodnoj Mezhdunarodnoj konferencii «Dialog», 2017. – S. 121-134.

Deerwester S.C., Dumais S.T., Landauer T.K., Furnas G.W., Harshman R.A. Indexing by latent semantic analysis // Journal of the American Society for Information Science. – 1990. – No. 41. – pp. 391-407.

Konovalov V. P., Tumunbayarova Z. B. Learning word embeddings for low resource languages: the case of Buryat // Komp'yuternaya lingvistika i intellektual'nye tekhnologii: materialy ezhegodnoj Mezhdunarodnoj konferencii «Dialog», 2018. – С. 331-341.

Savva Yu. B., Davydova Yu. V. Linguistic database for monitoring system of online social networks in providing information and psychological security // European integration: justice, freedom and security: proceedings of VII scientific and professional conference with international participation: in 3 volumes. – Belgrade: “Criminalistic-Police Academy” Publisher, 2016. – Vol. 1. – P. 145-15

Leskovec YU., Radzharaman A., Ul'man Dzh. Analiz bol'shih naborov dannyh. – M.: DMK Press, 2016. – 498 s.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162