The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to spread rapidly, with new variants emerging month by month that resist neutralization by antibodies to earlier strains. Following infection, the high numbers of patients who become seriously and critically sick greatly strain healthcare services.
Study: An ensemble prediction model for COVID-19 mortality risk. Image Credit: Adao/Shutterstock
Predicting which of the flood of people who test positive for the virus would be invaluable to optimally direct necessary medical and supportive services. Many models have been explored to provide this ability but with limited success.
A new preprint reports a novel machine learning model that purportedly predicts the mortality risk of coronavirus disease 2019 (COVID-19) in patients accurately and at an early stage of infection.
A preprint version of the study is available on the
medRxiv* server while the article undergoes peer review. How did the model work?
In this study, the scientists set up a novel system where data was subjected to preprocessing to deal with complicated clinical details before being implemented in an ensemble model (EM) set to yield a risk prediction for COVID-19 patients. Such a model depends on exploiting the strengths of several base models like Gradient Boosted Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM),
Subsequently, these models were trained on a large cohort. Testing these models in another large cohort led to successful validation of the predictive models. This is the first time that a model predicting high-risk outcomes for COVID-19 has been shown to be reliable in a large independent group and encourages hope that they could be of use in clinical situations.
The researchers used 14 out of 20 clinical elements like imputation of missing clinical features, along with the EM, to increase the accuracy of mortality prediction, as shown by comparing the results from this model with that of the conventional scoring systems used to assess COVID-19 severity.
In addition, they used a genetic algorithm (GA) to select the feature set of the most appropriate features from the clinical features since including redundant features reduces the accuracy of prediction. They were able to demonstrate the significant enhancement of predictive value after removing such features.
When analyzed for importance in the prediction of death risk, the most important was determined to be the mean arterial pressure (MAP), interleukin-6 (IL-6), procalcitonin, D-dimer (Ddimer), age, and glucose levels. The GA algorithm spat out the optimal combination coefficient of comprehensive usage using the five base models used for the EM.
Following this, there were a hundred rounds of validation using a half-half cross-validation technique, in all of which the EM showed itself to be best at predicting the outcome using a range of indicators. Thus, providing that certain key physiological parameters like inflammatory markers, hepatic and renal function tests, and cardiovascular function indicators, are available, an early prediction of high risk for mortality following presentation with COVID-19 can be made.
This would be a simple way to redirect resources where they are most needed.
In addition to the most valuable features, such as age, MAP, and physiological markers, others such as the markers of coagulation like D-dimer, glucose levels (that are related to hepatic and renal function), and cardiac function as indicated by troponin levels, are also closely related to mortality.
To make the model more useful, the researchers defined the reference ranges for each of the clinical features used here to enable rapid risk stratification for patients.
The testing results show the appropriateness of feature selection and the value of this model, which proved to provide robust results in different populations, with a range of ages and ethnicities, and with differences in the type of features included. This proves that despite such variations, the role of age, MAP, and markers of inflammation, clotting, impaired liver and renal function, and poor cardiovascular function are useful predictive features in COVID-19 patients for mortality risk stratification.
The results corroborate earlier studies that contributed to an awareness of the predictive importance of these markers while emphasizing the efficiency of feature selection as used in this model. The EM approach used multiple models that have been shown to have good predictive performance and other models like the logistic regression model and support vector machine.
When the performance of different models was compared, excellent discrimination was shown by this model.
In general, our predictive model (EM) is effective in predicting COVID-19 mortality risk.” *Important notice
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.