BERTGuard: Robust Text Classification against Adversarial Attacks

loading page

Laxmi Shaw,
Mohammed Wasim Ansari,
Tahir Ekin

Abstract

This paper introduces BERTGuard, a text classification framework that enhances the resilience of BERT models against adversarial attacks through feature trimming and sub-sampling within an adversarial training framework. Our approach outperforms the baseline BERT models against various augmented white-box adversarial attacks. Its resiliency is demonstrated using the benchmark IMDB movie review dataset.