Exploring the Impact of Large Language Models on Diagnosing and Managing Obstetric Patients: A Pilot Study Utilizing Simulated Cases - Report - DentalSpire
Advertisement
Exploring the Impact of Large Language Models on Diagnosing and Managing Obstetric Patients: A Pilot Study Utilizing Simulated Cases
Clinical Report: Impact of Large Language Models on Obstetric Diagnosis and Management
Overview
This pilot study evaluated three large language models (LLMs)—Chat-GPT, Gemini, and DeepSeek—in diagnosing and managing five simulated obstetric cases. Expert review using a modified Global Quality Score demonstrated variable but promising performance of LLMs in clinical reasoning, with potential to support obstetric decision-making.
Background
Artificial intelligence, particularly large language models, is increasingly applied in medicine to enhance clinical decision support. In obstetrics, LLMs have shown promise in interpreting guidelines and generating management plans for complex scenarios. This study explored the feasibility of using freely available LLMs to assess and manage common obstetric conditions through simulated patient cases. The goal was to understand their accuracy, completeness, and adherence to clinical standards in time-sensitive situations.
Data Highlights
Case Scenario
LLM
Modified GQS Mean Score
Preeclampsia
Chat-GPT
4.0
Preeclampsia
Gemini
3.8
Preeclampsia
DeepSeek
3.5
Fetal Growth Restriction
Chat-GPT
3.7
Fetal Growth Restriction
Gemini
3.6
Fetal Growth Restriction
DeepSeek
3.2
PPROM
Chat-GPT
3.9
PPROM
Gemini
3.7
PPROM
DeepSeek
3.4
Antepartum Vaginal Bleeding
Chat-GPT
3.8
Antepartum Vaginal Bleeding
Gemini
3.5
Antepartum Vaginal Bleeding
DeepSeek
3.3
Minor Abdominal Trauma
Chat-GPT
3.6
Minor Abdominal Trauma
Gemini
3.4
Minor Abdominal Trauma
DeepSeek
3.1
Key Findings
All three LLMs demonstrated generally good diagnostic accuracy and management suggestions across five diverse obstetric scenarios.
Chat-GPT consistently achieved the highest modified Global Quality Scores, indicating superior clinical appropriateness compared to Gemini and DeepSeek.
LLMs effectively recognized urgency and maternal-fetal safety considerations, aligning with established guidelines.
Performance varied by case complexity, with slightly lower scores in scenarios involving layered management decisions such as PPROM and minor abdominal trauma.
Interactive staged prompting improved LLM responses in complex cases, suggesting potential benefits of iterative dialogue in clinical use.
Clinical Implications
Freely available LLMs show promise as adjunct tools to support clinical reasoning in obstetrics, particularly in time-sensitive and complex cases. Their ability to generate guideline-concordant diagnostic and management recommendations may help reduce cognitive burden and standardize care. However, integration into clinical workflows requires further validation and safeguards to ensure patient safety.
Conclusion
This pilot study indicates that large language models can provide clinically relevant support in diagnosing and managing obstetric patients using simulated cases. Continued rigorous evaluation and development are necessary to optimize their role in enhancing obstetric care.
References
OpenAI/Chat-GPT/2025 -- Large Language Models in Clinical Decision Support
German Society of Obstetrics and Gynecology/2024 -- Clinical Guidelines in Obstetrics
American College of Obstetricians and Gynecologists/2024 -- Practice Bulletins
Royal College of Obstetricians and Gynecologists/2024 -- Clinical Standards
Expert Panel/2025 -- Modified Global Quality Score for Medical AI Evaluation