Artificial intelligence systems increasingly require robust evaluation techniques to ensure their logical reasoning capabilities align with practical applications. This research introduces a novel approach to empirically evaluate symmetrical reasoning in leading language models, specifically ChatGPT, Gemini, and Claude, utilizing automated interaction methods. The evaluation, encompassing metrics such as accuracy, consistency, and logical coherence, revealed distinct strengths and weaknesses among the models, with ChatGPT excelling in accuracy and logical coherence, Gemini demonstrating superior consistency, and Claude highlighting areas for further improvement. By employing a diverse dataset of symmetrical reasoning tasks and sophisticated automated scoring algorithms, the study provides an objective and comprehensive assessment, significantly contributing to the understanding of logical reasoning in AI. The findings underscore the critical importance of continuous model refinement and the integration of ethical considerations in developing advanced AI systems.