Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining [article]

Zachariah Zhang, Jingshu Liu, Narges Razavian
2020 arXiv   pre-print
Clinical interactions are initially recorded and documented in free text medical notes. ICD coding is the task of classifying and coding all diagnoses, symptoms and procedures associated with a patient's visit. The process is often manual and extremely time-consuming and expensive for hospitals. In this paper, we propose a machine learning model, BERT-XML, for large scale automated ICD coding from EHR notes, utilizing recently developed unsupervised pretraining that have achieved state of the
more » ... t performance on a variety of NLP tasks. We train a BERT model from scratch on EHR notes, learning with vocabulary better suited for EHR tasks and thus outperform off-the-shelf models. We adapt the BERT architecture for ICD coding with multi-label attention. While other works focus on small public medical datasets, we have produced the first large scale ICD-10 classification model using millions of EHR notes to predict thousands of unique ICD codes.
arXiv:2006.03685v1 fatcat:c3plcv7lvzcrbdv2awfry7isei