ChatGPT Generated Literature Review: Quod Erat Demonstrandum or Ends Justifying the Means?Dear Editor,We would like to draw your attention to the increasing popularity of the generative artificial intelligence (AI) chatbot, ChatGPT (OpenAI, 2023), and its relationship with scientific literature. We have attempted to replicate two literature reviews recently published in Clinical Otolaryngology using ChatGPT, comparing results, conclusions and references.Lee et al. (2022): Posterior nasal neurectomy for intractable rhinitis: A systematic review, was assessed. ChatGPT’s conclusions generated with the same research questions were comparable. However, ChatGPT’s references were confabulated raising questions of provenance and quality.Cereceda-Monteoliva et al. (2021), reviewed sarcoidosis of the ear, nose, and throat. Again, identical research questions generated near-identical results, including numerical values for incidence, features, and management. One generated reference appeared to be ‘similar’ in terms of the author’s name, but the title and journal were entirely incorrect. Of the remaining four references provided by ChatGPT, only one was a recognisable article. Further investigation shows ChatGPT lacks access to research databases, raising doubts about the reliability of the conclusions it presents.It is interesting that ChatGPT should generate correct conclusions but with incorrect working. We are reminded of school mathematics,quad erat demonstrandum (Q.E.D.), and where incorrect working affords you no marks regardless of a correct answer.ChatGPT is a Large Language Model (LLM) AI. Fundamentally, it mimics human intelligence but does not replicate it. ChatGPT does this by analysing vast quantities of data to predict the next most likely word in an answer – erroneously exemplified by the generated references. A scientific literature review follows a superficially similar process, analysing data and outputting a most likely conclusion. Crucially, the latter involves higher-order evaluation and critical thought based on myriad factors that seem currently out of reach for ChatGPT in this specific use case. Readers familiar with Bloom’s Taxonomy of Cognition will identify its relevance here[1].Often literature review produces an already anticipated conclusion but provides some of the highest quality evidence to base medical practice. Therefore, with ChatGPT, the ends do not justify the means for practiced medicine, even if the most likely worded conclusion is accurate.However, the exponential growth of LLM AIs is extraordinary. Near-future iterations of ChatGPT climbing to the top of Bloom’s Taxonomy are easily imagined. Improved critical reasoning with access to accurate databases of peer-reviewed material would substantiate an output, even if the conclusions are unchanged. An accurate ‘show of working’ could provide a meaningful AI-generated literature review to responsibly guide medical practice.Q.E.D. - Quod Erat DemonstrandumReferencesBloom, B.S.,Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. New York: David McKay Company.Cereceda-Monteoliva, N., Rouhani, M. J., Maughan, E. F., Rotman, A., Orban, N. T., Yaghchi, C. A., & Sandhu, G. S. (2021b). Sarcoidosis of the ear, nose and throat: A review of the literature. Clinical Otolaryngology , 46 (5), 935–940.https://doi.org/10.1111/coa.13814Lee, M. L., Chakravarty, P., & Ellul, D. (2022). Posterior nasal neurectomy for intractable rhinitis: A systematic review of the literature. Clinical Otolaryngology , 48 (2), 95–107.https://doi.org/10.1111/coa.13991OpenAI. (2023). OpenAI. Retrieved from https://openai.com/