Vulnerability Aspects Extraction and Discrepancies Detection across Heterogeneous Threat Intelligence

Lihua Wang; Jiamou Sun; Jiaojiao Jiang; Salil Kanhere; Zhenchang Xing; Sanjay Jha

doi:10.22541/au.173226080.05441422/v1

loading page

Vulnerability Aspects Extraction and Discrepancies Detection across Heterogeneous Threat Intelligence

Lihua Wang,
Jiamou Sun,
Jiaojiao Jiang,
Salil Kanhere,
Zhenchang Xing,
Sanjay Jha

Abstract

Security vulnerabilities are constantly reported and must be accurately documented for vulnerability repositories. Each vulnerability description usually includes key aspects, such as the vulnerable product, version, component, vulnerability type, root cause, impact, and attack vector. Understanding and managing these key aspects is crucial, but manually analyzing and integrating the growing number of vulnerabilities from heterogeneous databases is impractical, leading to the need for automated solutions. This study investigates the serious differences in aspect-level vulnerability information between major vulnerability databases such as NVD, IBM X-Force, ExploitDB, and Openwall. The study addresses two major challenges: improving the accuracy of extracting critical vulnerability aspects and distinguishing differences in these aspects across databases. The complexity of this task stems from the heterogeneous and often conflicting nature of data sources, coupled with the lack of effective techniques for accurate aspect extraction and discrepancy resolution. Recent research has shown that advanced natural language processing techniques, particularly large-scale language models (LLMs) such as GPT-3.5 and GPT-4, excel in handling detailed and context-rich textual data. Our approach leverages these LLMs to address the challenge of aspect-level differences in vulnerability information present in different databases. Through rigorous testing on a variety of datasets, our approach not only provides significant improvements over traditional models in extracting and distinguishing vulnerabilities more accurately but also enhances our ability to manage and integrate threat intelligence data effectively.