[Nature Reviews Bioengineering🔥] Application of Large Language Models in Medicine. A curated list of practical guide resources of Medical LLMs (Medical LLMs Tree, Tables, and Papers)
-
Updated
Sep 27, 2025
[Nature Reviews Bioengineering🔥] Application of Large Language Models in Medicine. A curated list of practical guide resources of Medical LLMs (Medical LLMs Tree, Tables, and Papers)
ClinVec: Unified Embeddings of Clinical Codes Enable Knowledge-Grounded AI in Medicine
[ NeurIPS 2022 ] Official Codebase for "ETAB: A Benchmark Suite for Visual Representation Learning in Echocardiography"
OpenEvidence MCP: open-source browser-session MCP server for human and AI-agent medical workflows
AI-powered chest X-ray pneumonia detection with 86% accuracy and 96.4% sensitivity, validated on an independent (cross-operator) cohort of 485 pediatric samples. Built with TensorFlow & FastAPI.
Systematic evaluation of hallucination risks in Large Language Models (GPT-4, Claude 3, Gemini Pro) for clinical proteomics and mass spectrometry interpretation. Production-ready detection framework with comprehensive benchmarks.
Which LLMs can a dentist trust? A reproducible benchmark of language models on clinical dental knowledge, with clinician-verified rubrics across 6 domains. Part of Periospot.
Ethnic bias analysis in medical imaging AI: Demonstrating that explainable-by-design models achieve 80% bias reduction across 5 ethnic groups (50k images)
🩺 AstraMed: A Clinical Risk Intelligence Platform. Powered by SOTA Ensemble ML & BioMistral-7B for predictive medical analytics and explainable risk scoring.
Rare AI Archive: open-source agentic diagnostic AI for rare genetic diseases — decentralized post-training, clinician validation, federated deployment
Cross-modal AI framework for dermatological disease analysis
An application to monitor clinical AI models
AI Skills for Dentists - Research critique, clinical evidence review, and content creation tools for Claude Desktop and Codex
Offline reinforcement learning for sepsis treatment policy evaluation using Conservative Q-Learning on a MIMIC-IV v3.1 Sepsis-3 ICU cohort.
A clinical evaluation framework for LLMs that measures accuracy, abstention, calibration, deferral behavior, runtime, token use, and structured-output reliability across high-stakes medical reasoning tasks.
Turn clinical guideline PDFs into deterministic, auditable decision engines with source citations.
MobileNetV2 pneumonia classifier validated on an independent 485-sample cross-operator cohort. 96.4% sensitivity, 96.4% ROC-AUC, bootstrap p=0.978. FastAPI inference API, Streamlit dashboard, DICOM support, Docker-ready
Open-source evaluation framework for healthcare AI applications.
Full-stack clinical AI platform — multi-agent RAG over FHIR patient data with 20+ medical tools, hybrid vector search, and real-time streaming
Add a description, image, and links to the clinical-ai topic page so that developers can more easily learn about it.
To associate your repository with the clinical-ai topic, visit your repo's landing page and select "manage topics."