This study presents a comprehensive evaluation of the cybersecurity robustness of five leading Large Language Models (LLMs)-ChatGPT-4, Google Gemini, Anthropic Claude, Meta Llama, and Mistral 8x7B-against adversarial prompts using the PromptBench benchmark. Through a dual approach of quantitative and qualitative analysis, the research explores each model's performance, resilience, and vulnerabilities. Quantitative metrics such as accuracy, precision, recall, and F1 scores offer a statistical comparison across models, while qualitative insights reveal distinct patterns of response and susceptibility to various adversarial strategies. The findings highlight significant variations in model robustness, underlining the importance of a complex approach to enhancing LLM security. This study not only sheds light on current limitations but also emphasizes the need for advancing evaluation methodologies and model development practices to mitigate potential threats and ensure the safe deployment of LLMs in sensitive and critical applications.