Artificial Intelligence in Cybersecurity: Finding Malware
Abstract
In this article, we examine in detail the technical aspects of searching (classifying) malicious software using machine (deep) learning. We consider both a classic example of classification using traditional models such as a multilayer perceptron, SVM, Random Forest, etc., and approaches associated with the use of large language models. The main problem in this type of task lies in the area of data preparation and collection. Code analysis for malware can be static, dynamic (runtime), or hybrid. Only the static approach provides little information, so mainly dynamic and hybrid ones are used. This means that the collection of features (characteristics) must be automated. Machine learning models have brought with them a new class of cyber attacks: adversarial attacks. Machine learning models used to find malware (including large language models) are no exception and are also susceptible to these types of attacks.
Full Text:
PDF (Russian)References
Namiot, Dmitry, Eugene Ilyushin, and Ivan Chizhov. "Artificial intelligence and cybersecurity." International Journal of Open Information Technologies 10.9 (2022): 135-147. (in Russian).
Annual number of malware attacks worldwide from 2015 to 2023 https://www.statista.com/statistics/873097/malware-attacks-per-year-worldwide/ Retrieved: Mar, 2024
Cyber Security Trends Report https://purplesec.us/resources/cyber-security-statistics/ Retrieved: 2024.
Magisterskaja programma «Iskusstvennyj intellekt v kiberbezopasnosti» (FGOS) https://cs.msu.ru/node/3732 Retrieved: Mar, 2024
Magisterskaja programma «Kiberbezopasnost'» https://cyber.cs.msu.ru/ Retrieved: Mar, 2024
Asmitha, K. A., et al. "Deep learning vs. adversarial noise: a battle in malware image analysis." Cluster Computing (2024): 1-30.
Neyshabur, Behnam, et al. "Exploring generalization in deep learning." Advances in neural information processing systems 30 (2017).
Zhang, Zhibo, et al. "Explainable artificial intelligence applications in cyber security: State-of-the-art in research." IEEE Access 10 (2022): 93104-93139.
Robey, Alexander, Hamed Hassani, and George J. Pappas. "Model-based robust deep learning: Generalizing to natural, out-of-distribution data." arXiv preprint arXiv:2005.10247 (2020)..
Narisada, Shintaro, et al. "Stronger targeted poisoning attacks against malware detection." International Conference on Cryptology and Network Security. Cham: Springer International Publishing, 2020.
Alzaylaee, Mohammed K., Suleiman Y. Yerima, and Sakir Sezer. "DL-Droid: Deep learning based android malware detection using real devices." Computers & Security 89 (2020): 101663.
Alzaylaee, Mohammed K., Suleiman Y. Yerima, and Sakir Sezer. "DynaLog: An automated dynamic analysis framework for characterizing android applications." 2016 International Conference On Cyber Security And Protection Of Digital Services (Cyber Security). IEEE, 2016.
Monkey https://developer.android.com/studio/test/other-testing-tools/monkey Retrieved: Mar, 2024
Li, Yuanchun, et al. "Droidbot: a lightweight ui-guided test input generator for android." 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 2017.
Pereira, Rafael Barros, et al. "Information gain feature selection for multi-label classification." (2015).
Amato, Flora, et al. "Multilayer perceptron: an intelligent model for classification and intrusion detection." 2017 31st International conference on advanced information networking and applications workshops (WAINA). IEEE, 2017.
Suciu, Octavian, Scott E. Coull, and Jeffrey Johns. "Exploring adversarial examples in malware detection." 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 2019.
Kolosnjaji, Bojan, et al. "Adversarial malware binaries: Evading deep learning for malware detection in executables." 2018 26th European signal processing conference (EUSIPCO). IEEE, 2018.
Namiot, Dmitry. "Schemes of attacks on machine learning models." International Journal of Open Information Technologies 11.5 (2023): 68-86.
MITRE ATLAS https://atlas.mitre.org/ Retrieved: Apr 2024
Tayyab, Umm-e-Hani, et al. "A survey of the recent trends in deep learning based malware detection." Journal of Cybersecurity and Privacy 2.4 (2022): 800-829.
Chen, Yun-Chun, et al. "Deep learning for malicious flow detection." 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC). IEEE, 2017.
Zhu, Dali, et al. "DeepFlow: Deep learning-based malware detection by mining Android application for abnormal usage of sensitive data." 2017 IEEE symposium on computers and communications (ISCC). IEEE, 2017.
Farzad Nourmohammadzadeh Motlagh, Mehrdad Hajizadeh, Mehryar Majd, et. al. Large language models in cybersecurity: State-of-the-art. arXiv preprint arXiv:2402.00891, 2024.
Andrei Kucharavy, Zachary Schillaci, Lo¨ıc Marechal, et. al. Fundamentals of generative large language models and perspectives in cyber-defense. arXiv preprint arXiv:2303.12132, 2023.
Sánchez, Pedro Miguel Sánchez, et al. "Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls." arXiv preprint arXiv:2405.09318 (2024).
Muhammad Ali, Stavros Shiaeles, Gueltoum Bendiab et.al. Machine learning and n-gram malware feature extraction and detection system. Electronics, 9(11):1777, 2020.
Quentin Fournier, Daniel Aloise, and Leandro R Costa. Language models for novelty detection in system call traces. arXiv preprint arXiv:2309.02206, 2023.
Hammerzeit. BASHLITE. https://github.com/hammerzeit/BASHLITE, Retrieved: Apr, 2024.
Nccgroup. The Tick – A simple embedded Linux backdoor. https://github.com/nccgroup/thetick/, Retrieved: Apr, 2024
Error996. bedevil (bdvl). https://github.com/Error996/bdvl/, Retrieved: Apr, 2024
Jimmyly00. Ransomware PoC GitHub repository. https://github.com/jimmy-ly00/Ransomware-PoC, Retrieved: Apr, 2024
LLM for malware detection. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=llm+for+malware+detection Retrieved: Mar, 2024
Ilyushin, Eugene, Dmitry Namiot, and Ivan Chizhov. "Attacks on machine learning systems-common problems and methods." International Journal of Open Information Technologies 10.3 (2022): 17-22.
Cifrovaja jekonomika i Internet Veshhej - preodolenie silosa dannyh / V. P. Kuprijanovskij, A. R. Ishmuratov, D. E. Namiot [i dr.] // International Journal of Open Information Technologies. – 2016. – T. 4, # 8. – S. 36-42. – EDN WFVAPB.
Iskusstvennyj intellekt kak strategicheskij instrument jekonomicheskogo razvitija strany i sovershenstvovanija ee gosudarstvennogo upravlenija. Chast' 2. Perspektivy primenenija iskusstvennogo intellekta v Rossii dlja gosudarstvennogo upravlenija / I. A. Sokolov, V. I. Drozhzhinov, A. N. Rajkov [i dr.] // International Journal of Open Information Technologies. – 2017. – T. 5, # 9. – S. 76-101. – EDN ZEQDMT.
Roznichnaja torgovlja v cifrovoj jekonomike / V. P. Kuprijanovskij, S. A. Sinjagov, D. E. Namiot [i dr.] // International Journal of Open Information Technologies. – 2016. – T. 4, # 7. – S. 1-12. – EDN WCMIWN.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность MoNeTec 2024
ISSN: 2307-8162