Publication:
Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management

dc.contributor.authorKaklamanos, Eleftherios G
dc.date.accessioned2025-10-01T09:16:04Z
dc.date.available2025-10-01T09:16:04Z
dc.date.issued2025-06-18
dc.description.abstractBackground/Objectives: Large Language Models (LLMs) are artificial intelligence (AI) systems with the capacity to process vast amounts of text and generate humanlike language, offering the potential for improved information retrieval in healthcare. This study aimed to assess and compare the evidence-based potential of answers provided by four LLMs to common clinical questions concerning the management and treatment of periodontal furcation defects. Methods: Four LLMs—ChatGPT 4.0, Google Gemini, Google Gemini Advanced, and Microsoft Copilot—were used to answer ten clinical questions related to periodontal furcation defects. The LLM-generated responses were compared against a “gold standard” derived from the European Federation of Periodontology (EFP) S3 guidelines and recent systematic reviews. Two board-certified periodontists independently evaluated the answers for comprehensiveness, scientific accuracy, clarity, and relevance using a predefined rubric and a scoring system of 0–10. Results: The study found variability in LLM performance across the evaluation criteria. Google Gemini Advanced generally achieved the highest average scores, particularly in comprehensiveness and clarity, while Google Gemini and Microsoft Copilot tended to score lower, especially in relevance. However, the Kruskal–Wallis test revealed no statistically significant differences in the overall average scores among the LLMs. Evaluator agreement and intra-evaluator reliability were high. Conclusions: While LLMs demonstrate the potential to answer clinical questions related to furcation defect management, their performance varies. LLMs showed different comprehensiveness, scientific accuracy, clarity, and relevance degrees. Dental professionals should be aware of LLMs’ capabilities and limitations when seeking clinical information
dc.identifier.other40559174
dc.identifier.urihttps://repository.mbru.ac.ae/handle/1/1805
dc.language.isoen
dc.subjectChatGPT
dc.subjectGoogle Gemini
dc.subjectMicrosoft Copilot
dc.subjectartificial intelligence
dc.subjectfurcation
dc.subjectperiodontics
dc.titleEvaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management.pdf
Size:
912.3 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: