Constructor of natural language processing blocks and its application in the problem of structuring logs in information security

Valentin Charugin; Valerii Charugin; Alexander Chesalin; Nadezhda Ushkova

Constructor of natural language processing blocks and its application in the problem of structuring logs in information security

Valentin Charugin, Valerii Charugin, Alexander Chesalin, Nadezhda Ushkova

Abstract

В статье рассматривается проблема обработки естественного языка в сфере информационной безопасности. В статье предложен конструктор блоков обработки естественного языка, описаны его концепция, архитектура и принцип работы. Рассмотрено решение проблемы структурирования журналов в сфере информационной безопасности с помощью разработанного конструктора. Формируется единый и стандартизированный формат записи событий. Проведен анализ моделей естественного языка (BERT, ALBERT, DistilBERT, XLNet, GPT-2) для задачи структурирования журналов. Качество алгоритмов оценивается с помощью следующих показателей: Точность и F1-Score.

Результаты задачи структурирования журнала могут быть использованы аналитиками и разработчиками в области информационной безопасности, а также могут быть использованы для расширения функциональности SIEM-системы.

Full Text:

PDF (Russian)

References

Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press. ISBN 978-0-262-13360-9, 1999.

Charugin V.V., Chesalin A.N. Analysis and creation of network traffic datasets to detect computer attacks // International Journal of Open Information Technologies ISSN: 2307-8162 vol. 11, no.6, 2023

Vaswani A. et al. Attention is All you Need (англ.) // Advances in Neural Information Processing, 2017. — arXiv:1706.03762

Jacob Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding// NAACL-HLT, 2019, pp 4171-4186.

Zhenzhong Lan et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations// ICLR 2020.

Victor Sanh et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter // NeurIPS 2019.

Zhilin Yang et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding // NeurIPS, 2019.

Alec Radford et al. Language Models are Unsupervised Multitask Learners. Preprint, 2019.

Charugin V.V., Chesalin A.N. Application of generative algorithms in information security problems // “Fundamental, exploratory, applied research and innovative projects” Collection of proceedings of the National Scientific and Practical Conference, 2023.

Metrics. Available: https://huggingface.co/metrics (URL)

Jieming Zhu et al. Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics. 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). 2023, pp 355-366.

Structured Logging. Available: https://sematext.com/glossary/structured-logging/ (URL)

Log Formatting in Production: 9 Best Practices. Available: https://betterstack.com/community/guides/logging/log-formatting/ (URL)

Refbacks

There are currently no refbacks.

Abava Кибербезопасность ИБП для ЦОД СНЭ

ISSN: 2307-8162