Abhishek Sebastian

and 1 more

As artificial intelligence (AI) continues to advance, its application in healthcare is rapidly evolving from automating routine tasks to handling complex medical decision-making. This paper critically evaluates OpenAI's GPT-o1 model, a large language model, as a potential AI-powered medical practitioner. The study analyzes its performance across various domains of medical practice, including clinical knowledge, differential diagnosis, pharmacology, treatment planning, ethical decision-making, and patient communication. A diverse range of clinical scenarios-spanning cardiology, endocrinology, pulmonology, oncology, and other specialties-was used to assess the model's accuracy, clinical reasoning, and ability to adhere to current evidence-based guidelines. The results revealed that GPT-o1 performs impressively in areas such as diagnosing acute conditions, with an accuracy rate of over 95% in certain cases like STEMI, and offering evidence-based treatment plans, especially in chronic disease management. The model demonstrated significant ethical sensitivity, particularly in handling patient autonomy and end-of-life care, with more than 90% of its responses aligning with ethical best practices. However, challenges remain in the model's ability to manage complex, multifactorial cases that require nuanced reasoning and contextual awareness. In scenarios involving differential diagnoses, such as tuberculosis in endemic regions, GPT-o1's accuracy dropped by 20% due to a lack of consideration of broader epidemiological factors. Moreover, the study found that while GPT-o1 excels in providing medically sound responses that adhere to clinical guidelines, it struggles in emergent, time-sensitive situations where rapid decision-making is critical. The model's response time in medical emergencies, like diabetic ketoacidosis (DKA) or hyperosmolar hyperglycemic state (HHS), lagged behind human judgment by 15-20%, raising concerns about its viability in high-pressure environments.Despite its high overall performance, with accuracy rates ranging between 85-95% across most specialties, GPT-o1 still requires substantial development to fully replicate the nuanced decision-making of human practitioners. This study concludes that while AI holds enormous potential to support healthcare providers, particularly in diagnostics and treatment planning, its current limitations emphasize the need for continued human oversight. To integrate AI models like GPT-o1 into healthcare systems effectively, ongoing collaboration between AI developers and medical professionals is crucial to ensure safety, reliability, and the continuous incorporation of the latest medical research.