Overview of public datasets for web application attack detecting

Evgeny Eremin

Abstract


Currently, web applications are one of the most popular means of providing services to the end user. Protection against attacks on web applications only increases its relevance every year. Modern Web Application Firewalls (WAF) often use machine learning in one form or another. Recent research shows that systems for detecting malicious HTTP requests using classical machine learning and deep learning in most cases outperform systems based on explicitly prescribed rules in terms of efficiency. Nevertheless, there are problems with reproducibility of experiments described in publications on this topic. In most works, there are results of evaluating proposed models based on publicly unavailable datasets; the result of the model's work on several publicly available datasets is not considered. This paper provides an overview of open (publicly available) datasets on the basis of which models for detecting malicious web requests can be trained and evaluated by metrics. This overview includes both widely used datasets for benchmarks and less well-known ones. In addition, the overview can be useful when forming combined datasets.


Full Text:

PDF (Russian)

References


Y.Sadqi, and Y.Maleh. (2022). A systematic review and taxonomy of web applications threats. Information Security Journal: A Global

Perspective, Taylor & Francis, 31, 1-27,

DOI: 10.1080/19393555.2020.1853855

R.L.Alaoui, and E.H.Nfaoui. (2022). Deep Learning for Vulnerability and Attack Detection on Web Applications: A Systematic Literature Review. Future Internet, 14(4), 118, DOI:10.3390/fi14040118

Hassan I. Halim, Mohamed Kholief, Fahima Maghraby et al. Deep Learning Methods in Web Intrusion Detection: A Systematic Review, 14 November 2022, PREPRINT (Version 1) available at Research Square. DOI:10.21203/rs.3.rs-2214647/v1

Toprak, S. & Yavuz, A.G. (2022). Web application firewall based on anomaly detection using deep learning. Acta Infologica. Advance online publication. DOI:10.26650/acin.1039042

B.A.Tama, and S.Lim. (2021). Ensemble learning for intrusion detection systems: A systematic mapping study and cross-benchmark evaluation. Computer Science Review, 39, 100357. DOI:10.1016/j.cosrev.2020.100357

Catillo, M., Pecchia, A., Rak, M., and Villano, U. (2021). Demystifying the role of public intrusion datasets: a replication study of DoS network traffic data. Computers & Security, Elsevier, 108, 102341. DOI:10.1016/j.cose.2021.102341

Alshaibi, A.; Al-Ani, M.; Al-Azzawi, A.; Konev, A.; Shelupanov, A. The Comparison of Cybersecurity Datasets. Data 2022, 7, 22.

DOI: 10.3390/data7020022

Get'man A.I., Goryunov M.N., Mackevich A.G., Rybolovlev D.A. Sravnenie sistemy obnaruzheniya vtorzhenij na osnove mashinnogo obucheniya s signaturnymi sredstvami zashchity informacii. Trudy ISP RAN, tom 34, vyp. 5, 2022 g., str. 111-126. DOI: 10.15514/ISPRAS-2022-34(5)-7

Kaniski, Matija & Dobša, Jasminka & Kermek, Dragutin. (2023). Deep Learning within the Web Application Security Scope -Literature Review. DOI:10.23919/MIPRO57284.2023.10159847.

Erohin S.D., Zhuravlev A.P. Sravnitel'nyj analiz otkrytyh naborov dannyh dlya ispol'zovaniya tekhnologij iskusstvennogo intellekta pri reshenii zadach informacionnoj bezopasnosti // Sistemy sinhronizacii, formirovaniya i obrabotki signalov, 2020. T.3. №3. S. 12-19.

Gim´enez, C. T., Villegas, A. P., and Mara˜n´on, G. A. (2010). HTTP data set CSIC 2010. Information Security Institute of CSIC (Spanish Research National Council), 64, https://www.isi.csic.es/dataset/ Retrieved: 17.12.2023

Chedy Ra¨ıssi, Johan Brissaud, G´erard Dray, Pascal Poncelet, Mathieu Roche, et al, Web Analyzing Traffic Challenge: Description and Results. ECML PKDD 2007 Discovery Challenge, 2007, Warsaw, Poland

Riera, T. S., Higuera, J. B., Higuera, J. B., Herraiz, J. M., and Montalvo, J. S. (2022). A new multi-label dataset

for Web attacks CAPEC classification using machine learning techniques. Computers & Security, Elsevier, 120, 102788.

DOI: 10.1016/j.cose.2022.102788.

FWAF Machine Learning driven Web Firewall https://github.com/faizann24/Fwaf-Machine-Learning-driven-Web-Application-Firewall Retrieved: 17.12.2023

Machine Learning Web Application Firewall and Dataset https://github.com/grananqvist/Machine-Learning-Web-Application-Firewall-and-Dataset Retrieved: 17.12.2023

HttpParams dataset https://github.com/Morzeux/HttpParamsDataset Retrieved: 17.12.2023

FuzzDB https://github.com/fuzzdb-project/fuzzdb Retrieved: 17.12.2023

Vulnbank dataset https://github.com/PositiveTechnologies/seq2seq-web-attack-detection Retrieved: 17.12.2023

Boss of the SOC v1 dataset https://github.com/splunk/botsv1 Retrieved: 17.12.2023

Samples of security related data https://www.secrepo.com/ Retrieved: 17.12.2023

PayloadAllTheThings https://github.com/swisskyrepo/PayloadsAllTheThings Retrieved: 17.12.2023

OWASP ModSecurity Core Rule Set (CRS) https://github.com/coreruleset/coreruleset Retrieved: 17.12.2023

Sigma rules for SIEM https://github.com/SigmaHQ Retrieved: 17.12.2023


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162