Extracting Training Data: Risks and solutions in the context of LLM security

Denis V. Gerasimenko, Dmitry Namiot

Abstract


The quality of results from modern language models is inextricably linked to the amount of data on which the model is trained. Recent high-profile investigations around companies in artificial intelligence were precisely related to the illegal use of information obtained from the Internet. Another side of the fight for the use of user data is the tacit expansion of user agreements, where the company is allowed to use the obtained information to train its models. This paper is devoted to analyzing of modern problems related to the extraction of training data from large language models (LLM), such as the GPT and Llama families. Using large amounts of unstructured data to train modern models makes these models attractive targets for attacks to gain access to this data or its characteristics. The article highlights the taxonomy of attacks aimed at extracting training data and describes the consequences that can arise from the illegal use of language models. The study showed that without proper protection, training data can be used by attackers to recover confidential information, which in turn threatens not only users but also the reputation of organizations.

Full Text:

PDF (Russian)

References


National Institute of Standards and Technology. Adversarial machine

learning: A taxonomy and terminology of attacks and mitigations. — 2024. — URL: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI. 100-2e2023.pdf.

Dinur Irit, Nissim Kobbi. Revealing information while preserving privacy // Proceedings of the 22nd ACM Symposium on Principles of Database Systems (PODS ’03). — ACM, 2003. — P. 202–210.

Exposed! a survey of attacks on private data / Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman. — 2017. — URL: https://privacytools.seas.harvard.edu/publications/ exposed-survey-attacksprivate-data.

Balle Borja, Cherubin Giovanni, Hayes Jamie. Reconstructing training data with informed adversaries // arXiv. — 2021. — URL: https:// arxiv.org/abs/2201.04845.

Membership inference attacks from first principles / Nicholas Carlini, Steve Chien, Milad Nasr et al. // ArXiv. — 2021. — URL: https: //arxiv.org/abs/2112.03570.

Privacy risk in machine learning: Analyzing the connection to overfitting / Samuel Yeom, Irene Giacomelli, Matt Fredrikson, Somesh Jha // arXiv.— 2018. — URL: https://arxiv.org/abs/1709.01604.

Membership inference attacks against machine learning models / Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov // IEEE. — 2017. — URL: https://ieeexplore.ieee.org/document/ 7958568.

Google LLC. Tensorflow privacy. — https://github.com/tensorflow/ privacy. — Library for training machine learning models with privacy for training data.

PrivacyTrustLab. Privacy meter: An open-source library to audit data privacy in statistical and machine learning algorithms. — https: //github.com/privacytrustlab/ml-privacy-meter.

Stealing machine learning models via prediction apis / Florian Tramer, Fan Zhang, Ari Juels et al. // ArXiv. — 2016. — URL: https://arxiv.org/abs/1609.02943

Snap: Efficient extraction of private properties with poisoning / Harsh Chaudhari, John Abascal, Alina Oprea et al. // IEEE. — 2023. — URL: https://ieeexplore.ieee.org/document/10179334.

Suri Anshuman, Evans David. Formalizing and estimating distribution inference risks // ArXiv. — 2021. — URL: https://arxiv.org/abs/2109. 06024.

Zhang Wanrong, Tople Shruti, Ohrimenko Olga. Leakage of dataset properties in multi-party machine learning // ArXiv. — 2021. — URL: https://arxiv.org/abs/2006.07267.

Privacy side channels in machine learning systems / Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini et al. // ArXiv. — 2023. —

URL: https://arxiv.org/abs/2309.05610.

Propile: Probing privacy leakage in large language models /

Si- won Kim, Sangdoo Yun, Hwaran Lee et al. // arXiv preprint arXiv:2307.00123. — 2023. — July. — Submitted on 4 Jul 2023. URL:

https://arxiv.org/abs/2307.00123.

Pii-compass: Guiding llm training data extraction prompts towards the target pii via grounding / Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes et al. // arXiv preprint arXiv:2407.00001. — 2024. — July. — Submitted on 3 Jul 2024. URL: https://arxiv.org/abs/2407. 00001.

Imprompter: Tricking llm agents into improper tool use / Xiaohan Fu,

Shuheng Li, Zihan Wang et al. // arXiv preprint arXiv:2410.00000. — 2024. — October. — Submitted on 19 Oct 2024 (v1), last revised 22 Oct 2024 (this version, v2).

Scalable extraction of training data from (production) language models / Milad Nasr, Nicholas Carlini, Jonathan Hayase et al. // ArXiv. — 2016. — URL: https://arxiv.org/abs/2311.17035.

Das Badhan Chandra, Amini M. Hadi, Wu Yanzhao. Security and privacy challenges of large language models: A survey // arXiv preprint arXiv:2402.00888. — 2024. — January. — Submitted on 30 Jan 2024. URL: https://arxiv.org/abs/2402.00888.

Magisterskaja programma «iskusstvennyj intellekt v kiberbezopasnosti» (fgos). — 2024. — October. — Retrieved: 02.10.2024. URL: https://cs.msu.ru/node/3732.

V. P. Kuprijanovskij N. A. Utkin D. E. Namiot P. V. Kuprijanovskij. Cifrovaja jekonomika = modeli dannyh + bol’shie dannye + arhitektura +

prilozhenija? // International Journal of Open Information Technologies. — 2016. — Vol. 4, no. 5. — P. 1–13.

V. P. Kuprijanovskij V. V. Alen’kov A. V. Stepanenko [i dr.]. Razvitie transportno-logisticheskih otraslej evropejskogo sojuza: otkrytyj bim, internet veshhej i kiber-fizicheskie sistemy // International Journal of Open Information Technologies. — 2018. — Vol. 6, no. 2. — P. 54– 100.

V. P. Kuprijanovskij V. V. Alen’kov I. A. Sokolov [i dr.]. Umnaja

infrastruktura, fizicheskie i informacionnye aktivy, smart cities, bim, gis i iot // International Journal of Open Information Technologies. — 2017. — Vol. 5, no. 10. — P. 55–86


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162