This project focuses on building a Convolutional Neural Network (CNN) to classify Chest X-Ray images into two categories: Normal and Pneumonia.
This project transcends standard engineering practices by being directly governed and audited by a Medical Doctor (MD). The "black box" nature of typical AI models was critically examined through clinical experience, leading to the following discoveries:
A manual clinical audit of 100 random chest radiographs revealed:
- False Negative Labels: Approximately 15% of images labeled as "Normal" (e.g., Cases 4, 10, 27, 86) actually exhibited clear bilateral interstitial/peribronchial patterns.
- False Positive Labels: Approximately 20% of images labeled as "Pneumonia" (e.g., Cases 14, 15, 18, 19, 94, 100) showed no positive radiological markers for infiltration.
Conclusion: The inability of existing models to reach 100% accuracy is not due to architectural failures, but rather the "Noisy Labels" inherited from clinical-only diagnoses that lack radiological confirmation.
The expert audit established a new gold standard for this project: "To build a reliable medical AI, training data must be derived from images that are both supported by visual positive findings AND confirmed by clinical diagnosis."
Following the MD's audit, the model was redesigned with the following improvements:
- Preserving Tissue Detail (256x256): Increased resolution to prevent pixellation, allowing the AI to "see" the subtle interstitial markings identified during the audit.
- Bias Mitigation (Class Weights): Implemented a class-weighting strategy to force the model to prioritize the under-represented "Normal" class, reducing over-sensitive diagnostic errors.
- Clinical Performance: Despite the ~17% label noise, Model v2.0 achieved 91% Accuracy and a high-fidelity 95% Recall (Sensitivity).
| Metric | v1.0 (Standard) | v2.0 (Expert-Audited) |
|---|---|---|
| Pneumonia Recall (Sensitivity) | ~90% | 95% |
| Pneumonia Precision | ~80% | 90% |
| Overall Accuracy | 85% | 91% |
As with any robust scientific study, this project acknowledges the following limits:
- Sample Size: The clinical audit was limited to 100 cases; it may not represent the full noise distribution of all 5,800 images.
- Geographical Bias: Data originates from a single pediatric center in Guangzhou, which may limit generalizability to adult or international cohorts.
- Binary Simplification: The model currently focuses on Normal vs. Pneumonia; multi-pathology detection (effusion, atelectasis) remains a future milestone.
This study demonstrates that when AI development is led by Domain Experts with clinical vision, the resulting models are significantly more productive, reliable, and scientifically honest than standard "engineering-only" approaches. This project serves as a concrete example of the Expert-in-the-Loop paradigm in modern radiology.
The entire workflow is contained within a Jupyter Notebook.
- Open the
.ipynbnotebook in Google Colab or your local Jupyter environment. - Upload the respective dataset.
- Run all cells sequentially to train and evaluate the model.
For a deep dive into the clinical audit methodology, label noise discovery, and the transition to v2.0, please refer to our Detailed Clinical Audit Report.
This project was developed and audited by [Emrah Seker], a Medical Doctor specializing in Clinical AI Governance.
- LinkedIn: [Connect on LinkedIn]
- Email: [exhume19@gmail.com]
- Project Goal: Bridge the gap between engineering and clinical reality.
