BERTGuard: Robust Text Classification against Adversarial Attacks
- Laxmi Shaw,
- Mohammed Wasim Ansari,
- Tahir Ekin
Tahir Ekin
Corresponding Author:
Abstract
This paper introduces BERTGuard, a text classification framework that enhances the resilience of BERT models against adversarial attacks through feature trimming and sub-sampling within an adversarial training framework. Our approach outperforms the baseline BERT models against various augmented white-box adversarial attacks. Its resiliency is demonstrated using the benchmark IMDB movie review dataset.