Exploring the Impact of Large Language Models on Diagnosing and Managing Obstetric Patients: A Pilot Study Utilizing Simulated Cases - Report - DentalSpire

Exploring the Impact of Large Language Models on Diagnosing and Managing Obstetric Patients: A Pilot Study Utilizing Simulated Cases

  • By

  • Iason Psilopatis

  • Katharina Redling

  • Valeria Filippi

  • Sofia Kappos

  • Julius Emons

  • Beatrice Mosimann

  • Tibor A. Zwimpfer

  • April 27, 2026

  • 0 min

Share

Clinical Report: Impact of Large Language Models on Obstetric Diagnosis and Management

Overview

This pilot study evaluated three large language models (LLMs)—Chat-GPT, Gemini, and DeepSeek—in diagnosing and managing five simulated obstetric cases. Expert review using a modified Global Quality Score demonstrated variable but promising performance of LLMs in clinical reasoning, with potential to support obstetric decision-making.

Background

Artificial intelligence, particularly large language models, is increasingly applied in medicine to enhance clinical decision support. In obstetrics, LLMs have shown promise in interpreting guidelines and generating management plans for complex scenarios. This study explored the feasibility of using freely available LLMs to assess and manage common obstetric conditions through simulated patient cases. The goal was to understand their accuracy, completeness, and adherence to clinical standards in time-sensitive situations.

Data Highlights

Case ScenarioLLMModified GQS Mean Score
PreeclampsiaChat-GPT4.0
PreeclampsiaGemini3.8
PreeclampsiaDeepSeek3.5
Fetal Growth RestrictionChat-GPT3.7
Fetal Growth RestrictionGemini3.6
Fetal Growth RestrictionDeepSeek3.2
PPROMChat-GPT3.9
PPROMGemini3.7
PPROMDeepSeek3.4
Antepartum Vaginal BleedingChat-GPT3.8
Antepartum Vaginal BleedingGemini3.5
Antepartum Vaginal BleedingDeepSeek3.3
Minor Abdominal TraumaChat-GPT3.6
Minor Abdominal TraumaGemini3.4
Minor Abdominal TraumaDeepSeek3.1

Key Findings

  • All three LLMs demonstrated generally good diagnostic accuracy and management suggestions across five diverse obstetric scenarios.
  • Chat-GPT consistently achieved the highest modified Global Quality Scores, indicating superior clinical appropriateness compared to Gemini and DeepSeek.
  • LLMs effectively recognized urgency and maternal-fetal safety considerations, aligning with established guidelines.
  • Performance varied by case complexity, with slightly lower scores in scenarios involving layered management decisions such as PPROM and minor abdominal trauma.
  • Interactive staged prompting improved LLM responses in complex cases, suggesting potential benefits of iterative dialogue in clinical use.

Clinical Implications

Freely available LLMs show promise as adjunct tools to support clinical reasoning in obstetrics, particularly in time-sensitive and complex cases. Their ability to generate guideline-concordant diagnostic and management recommendations may help reduce cognitive burden and standardize care. However, integration into clinical workflows requires further validation and safeguards to ensure patient safety.

Conclusion

This pilot study indicates that large language models can provide clinically relevant support in diagnosing and managing obstetric patients using simulated cases. Continued rigorous evaluation and development are necessary to optimize their role in enhancing obstetric care.

References

  1. OpenAI/Chat-GPT/2025 -- Large Language Models in Clinical Decision Support
  2. German Society of Obstetrics and Gynecology/2024 -- Clinical Guidelines in Obstetrics
  3. American College of Obstetricians and Gynecologists/2024 -- Practice Bulletins
  4. Royal College of Obstetricians and Gynecologists/2024 -- Clinical Standards
  5. Expert Panel/2025 -- Modified Global Quality Score for Medical AI Evaluation

Original Source(s)

Related Content