Predicting pulmonary embolism among hospitalized patients with machine learning algorithms

Predicting pulmonary embolism among hospitalized patients with machine learning algorithms

ABSTRACT

Background

Pulmonary embolisms (PE) are life-threatening medical events, and early identification of patients experiencing a PE is essential to optimizing patient outcomes. Current tools for risk stratification of PE patients are limited and unable to predict PE events before their occurrence.

Objective

We developed a machine learning algorithm (MLA) designed to identify patients at risk of PE before the clinical detection of onset in an inpatient population.

Materials and Methods

Three machine learning (ML) models were developed on electronic health record data from 63,798 medical and surgical inpatients in a large US medical center. These models included logistic regression, neural network, and gradient boosted tree (XGBoost) models. All models used only routinely collected demographic, clinical, and laboratory information as inputs. All were evaluated for their ability to predict PE at the first time patient vital signs and lab measures required for the MLA to run were available. Performance was assessed with regard to the area under the receiver operating characteristic (AUROC), sensitivity, and specificity.

Results

The model trained using XGBoost demonstrated the strongest performance for predicting PEs. The XGBoost model obtained an AUROC of 0.85, a sensitivity of 81%, and a specificity of 70%. The neural network and logistic regression models obtained AUROCs of 0.74 and 0.67, sensitivity of 81% and 81%, and specificity of 44% and 35%, respectively.

Conclusions

This algorithm may improve patient outcomes through earlier recognition and prediction of PE, enabling earlier diagnosis and treatment of PE.