Adversarial attacks defending algorithm in language models

Dmitry Tishencko, Tatyana Prikhodko

Abstract


The article discusses one of the significant challenges of modern information technology - the issue of protecting large language models against adversarial attacks. With the increasing automation and intelligence of business, virtual intelligent assistants are becoming essential components that can substitute numerous human resources and significantly decrease the cost of their provision. However, a major concern remains the instability of generated sequences when such models are attacked. While large companies have access to powerful security architectures, small and medium enterprises also require effective protection methods. The article delves into various aspects of adversarial attacks, techniques for generating text sequences, and proposes a method for defending against these attacks. Particular emphasis is placed on training a censor model, a key component of the proposed protection mechanism.


Full Text:

PDF (Russian)

References


Schwinn L., Dobre D., Günnemann S. Adversarial Attacks and Defenses in Large Language Models: Old and New Threats / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/2310.19737 (Retrieved: 04.04.24).

Cohen S., Bitton R., Nassi B. Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/2403.02817 (Retrieved: 05.04.24).

HuggingFace.co: [сайт]. - 2024. - URL: https://huggingface.co/Intel/neural-chat-7b-v3-2 (Retrieved: 06.04.24). - Text: electronic.

Reimers N., Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/1908.10084 (Retrieved: 08.04.24).

Devlin J., Chang M., Lee K. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/1810.04805 (Retrieved: 09.04.24).

Zhang W.E., Sheng Q.Z., Alhazmi A. Sentence- Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/1901.06796 (Retrieved: 10.04.24).

Zmitrovich D., Abramov A., Kalmykov A. A Family of Pretrained Transformer Language Models for Russian / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/2309.10931 (Retrieved: 12.04.24).

Tay Y., Dehghani M., Tran V.Q. UL2: Unifying Language Learning Paradigms / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/2205.05131v2 (Retrieved: 13.04.24).

Kanerva J., Kitti H., Chang L. Semantic Search as Extractive Paraphrase Span Detection / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/2112.04886 (Retrieved: 16.04.24).

Gao Y., Xiong Y., Gao X. Retrieval-Augmented Generation for Large Language Models: A Survey / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/2312.10997 (Retrieved: 17.04.24).

Su Y., Lan T., Wang Y. A Contrastive Framework for Neural Text Generation / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/2202.06417 (дата обращения: 19.04.24).

Brown T.B., Mann B., Ryder N. Language Models are Few-Shot Learners / Text: electronic // Cornell University. ArXiv – URL: https://arxiv.org/abs/2005.14165 (Retrieved: 21.04.24).


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162