Machine Learning Models Explanations and Adversarial Attacks
Abstract
The paper considers the practical construction of adversarial attacks (evasion attacks) on machine learning models using explanations of their operation. Despite the fact that machine learning models, in general, are a black box, there are schemes for constructing explanations (their approximations) that allow us to evaluate how exactly a decision is made. Even if our model is not a decision tree, we can obtain a similar explanation for decision making in the model. One example of such schemes is the use of SHAP values. Such an approach allows us to form attacks in black box mode. If the training dataset of the attacked model or even a part of it is known, the attacker can use it to train his model of arbitrary architecture. Then, explanations can be constructed for this model, which can be used to form an attack. Since adversarial attacks are portable, such attacks can be reproduced on the attacked model. The source code for such experiments is given in this paper. Network traffic classification models in the Internet of Things system are considered as an attackable example.
Full Text:
PDF (Russian)References
Ilyushin, Eugene, Dmitry Namiot, and Ivan Chizhov. "Attacks on machine learning systems-common problems and methods." International Journal of Open Information Technologies 10.3 (2022): 17-22.
Namiot, Dmitry. "Schemes of attacks on machine learning models." International Journal of Open Information Technologies 11.5 (2023): 68-86.
Zhao, Zhengyu, et al. "Towards good practices in evaluating transfer adversarial attacks." arXiv preprint arXiv:2211.09565 (2022).
Navigate threats to AI systems through real-world insights https://atlas.mitre.org/ Retrieved: Jun, 2025
Slack, Dylan, et al. "Fooling lime and shap: Adversarial attacks on post hoc explanation methods." Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 2020.
Fidel, Gil, Ron Bitton, and Asaf Shabtai. "When explainability meets adversarial learning: Detecting adversarial examples using shap signatures." 2020 international joint conference on neural networks (IJCNN). IEEE, 2020.
Hickling, Thomas, Nabil Aouf, and Phillippa Spencer. "Robust adversarial attacks detection based on explainable deep reinforcement learning for UAV guidance and planning." IEEE Transactions on Intelligent Vehicles 8.10 (2023): 4381-4394.
Aryal, Kshitiz, et al. "Explainability guided adversarial evasion attacks on malware detectors." 2024 33rd International Conference on Computer Communications and Networks (ICCCN). IEEE, 2024.
IoT Intrusion https://www.kaggle.com/datasets/subhajournal/iotintrusion/data Retrieved: Jun, 2025
CICIoT2023 models https://www.google.com/search?q=CICIoT2023 Retrieved: Jun, 2025
An introduction to explainable AI with Shapley values https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html Retrieved: Jun, 2025
Christoph Molnar Interpretable Machine Learning. A Guide for Making Black Box Models Explainable https://christophm.github.io/interpretable-ml-book/
Grini, Anass, et al. "Constrained Network Adversarial Attacks: Validity, Robustness, and Transferability." arXiv preprint arXiv:2505.01328 (2025).
GitHub 1, 2025. Available: https://github.com/lava-aaa/iot_hw
GitHub 2, 2025. Available: https://github.com/Dark-Avery/DDoS_classifier
Sukhomlin, Vladimir A. "The Concept and Main Characteristics of the Master's Degree Program" Cybersecurity" of the Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University." International Journal of Open Information Technologies 11.7 (2023): 143-148.
Namiot, Dmitry, and Vladimir Sukhomlin. "On cybersecurity of the Internet of Things systems." International Journal of Open Information Technologies 11.2 (2023): 85-97.
Namiot, Dmitry, Eugene Ilyushin, and Ivan Chizhov. "Artificial intelligence and cybersecurity." International Journal of Open Information Technologies 10.9 (2022): 135-147.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность ИБП для ЦОД СНЭ
ISSN: 2307-8162