Enhancing Reliability in Large Language Models: Self-Detection of Hallucinations With Spontaneous Self-Checks

Steven Behore; Liam Dumont; Julian Venkataraman

doi:10.22541/au.172591474.45065639/v1

loading page

Enhancing Reliability in Large Language Models: Self-Detection of Hallucinations With Spontaneous Self-Checks

Steven Behore,
Liam Dumont,
Julian Venkataraman

Abstract

Modern automated text generation systems frequently produce outputs that contain factual inaccuracies, inconsistencies, and logical errors, which compromise the reliability of the generated content. A novel framework was proposed to enable autonomous detection and correction of such errors, achieving significant improvements in real-time hallucination mitigation without human intervention. The methodology introduced a selfmonitoring mechanism using a dual-model architecture, tokenlevel confidence scoring, and embedding consistency checks, designed to enhance the accuracy of content generated across diverse domains. Experiments demonstrated that the model significantly reduced hallucination occurrences and improved precision and recall, with adaptability shown across multiple datasets, including general knowledge, domain-specific texts, and synthetic hallucinations. Results indicated that the integration of self-detection mechanisms led to more reliable outputs while minimizing false positives and refining the overall content generation process. The findings suggest that automated systems can achieve higher levels of accuracy and efficiency, making them suitable for applications where reliability is critical.