From: Effectiveness of various general large language models in clinical consensus and case analysis in dental implantology: a comparative study
Model
Accuracy
CL95% (Wilson Score Interval)
ChatGPT-4
0.74
0.604 to 0.841
Qwen 2.0 72B
0.6
0.462 to 0.724
Claude 3 Opus
0.72
0.583 to 0.825
Gemini Pro 1.5(0801)
0.8
0.670 to 0.888