Avisha Das

and 4 more

Large language models (LLMs) with billions of parameters and trained on massive amounts of crowdsourced public data have made a dramatic impact on natural language processing (NLP) tasks. Domain specific 'finetuning' of LLMs has further improved model behavior through task-specific alignment and refinement. However with the widespread development and deployment of LLMs, existing vulnerabilities in these models have made way for perpetrators to manipulate them for malicious intentions. The widespread integration of LLMs in clinical NLP underscores the looming threat of privacy leakage and targeted attacks like prompt injection or data poisoning. In this work, we designed a systematic framework to expose vulnerabilities of clinical generative language model with a specific emphasis on its application to clinical notes. We design three attack pipelines to highlight model's susceptibility to core types of targeted attacks-(i) instruction-based data poisoning, (ii) trigger-based model editing and (iii) membership inference on de-identified breast cancer clinical notes. Our proposed framework is the first work to investigate the extent of LLM-based attacks in the clinical domain. Our findings reveal successful manipulation of LLM behavior, prompting concerns about the stealthiness and effectiveness of such attacks. Through this work, we hope to emphasize on the urgency of comprehending these vulnerabilities in LLMs and encourage the mindful and responsible usage of LLMs in the clinical domain.