Hospital-acquired pressure injuries (HAPIs) have a major impact on patient outcomes in intensive care units (ICUs). Effective prevention relies on early and accurate risk assessment. Traditional risk-assessment tools, such as the Braden Scale, often fail to capture ICU-specific factors, limiting their predictive accuracy. Although artificial intelligence models offer improved accuracy, their “black box” nature poses a barrier to clinical adoption.
To develop an artificial intelligence–based HAPI risk-assessment model enhanced with an explainable artificial intelligence dashboard to improve interpretability at both the global and individual patient levels.
An explainable artificial intelligence approach was used to analyze ICU patient data from the Medical Information Mart for Intensive Care. Predictor variables were restricted to the first 48 hours after ICU admission. Various machine-learning algorithms were evaluated, culminating in an ensemble “super learner” model. The model’s performance was quantified using the area under the receiver operating characteristic curve through 5-fold cross-validation. An explainer dashboard was developed (using synthetic data for patient privacy), featuring interactive visualizations for in-depth model interpretation at the global and local levels.
The final sample comprised 28 395 patients with a 4.9% incidence of HAPIs. The ensemble super learner model performed well (area under curve = 0.80). The explainer dashboard provided global and patient-level interactive visualizations of model predictions, showing each variable’s influence on the risk-assessment outcome.
The model and its dashboard provide clinicians with a transparent, interpretable artificial intelligence–based risk-assessment system for HAPIs that may enable more effective and timely preventive interventions.
Notice to CE enrollees
This article has been designated for CE contact hour(s). The evaluation demonstrates your knowledge of the following objectives:
Describe how the explainable artificial intelligence (AI) model for assessing hospital-acquired pressure injury (HAPI) risk was developed.
Describe the explainable AI dashboard’s role in improving HAPI risk assessment.
Analyze the benefits of using explainable AI for transparency in clinical decision-making.
To complete the evaluation for CE contact hour(s) for activity A2493, visit https://aacnjournals.org/ajcconline/ce-articles. No CE fee for AACN members. See CE activity page for expiration date.
The American Association of Critical-Care Nurses is accredited as a provider of nursing continuing professional development by the American Nurses Credentialing Center’s Commission on Accreditation, ANCC Provider Number 0012. AACN has been approved as a provider of continuing education in nursing by the California Board of Registered Nursing (CA BRN), CA Provider Number CEP1036, for 1.0 contact hour.
Pressure injury, an area of damage to the skin, underlying tissue, or both that is caused by pressure or pressure combined with shear,1 occurs during hospitalization in 4% to 6% of critical care patients.2 It leads to considerable human suffering, extended hospital stays, and increased overall costs.3 Hospital-acquired pressure injuries (HAPIs) are thought to be mostly preventable if appropriate interventions are performed in a timely manner.
Crucial to the prevention of HAPIs is the exercise of clinical judgment by nurses, who must decide which interventions are most appropriate for each individual patient and when interventions should be implemented. Thus, early and accurate HAPI risk assessment is important for guiding clinical choices and enabling the initiation of prompt preventive measures. However, traditional HAPI risk–assessment methods such as the Braden Scale4 do not incorporate key risk factors unique to critical care patients, such as oxygenation and perfusion.2
Artificial intelligence models can predict pressure injury risk using data from the electronic health record, eliminating the need for manual documentation by nurses.
Artificial intelligence (AI) models can process and analyze extensive amounts of data to identify complex relationships among variables, including critical care–specific factors such as oxygenation and perfusion. Previous studies have shown that AI approaches outperform traditional methods, such as use of the Braden Scale, in accurate HAPI risk prediction.5 –7 However, AI models’ “black box” nature poses a clinical challenge. Health care professionals are justifiably reluctant to trust a system whose reasoning is not transparent. Furthermore, “black box” algorithms offer no guidance in human decision-making or clinical judgment.8 ,9
A notable advance that makes decision-making more transparent has been the development of explainable AI (XAI)10 ,11 and its integration into interactive dashboards. Interactive dashboards may empower bedside nurses by facilitating their understanding of an AI model’s functionality, enabling insights into both its overall performance and the significance of individual variables, as well as detailed assessments at the patient-specific level. This ability to scrutinize individual patient-level decisions is essential, as models lack the contextual understanding that bedside clinicians possess.12 ,13 By offering this level of transparency, interactive dashboards allow clinicians to augment their clinical judgment with the model’s risk assessment—enhancing, rather than supplanting, their expertise before they take action.14 Thus, XAI-powered dashboards offer a promising avenue for elevating the precision and timeliness of HAPI risk assessments by augmenting nurses’ clinical expertise with AI.
Objective
The purpose of this study was to develop an AI-based approach for assessing the risk of HAPIs and to implement an explainable dashboard to enhance the transparency of the AI-generated risk assessment.
Methods
Study Design and Sample Selection
We used an explainable AI approach to examine data obtained from the Medical Information Mart for Intensive Care (MIMIC)–IV (version 2.2) dataset, a retrospective database consisting of data from patients admitted to intensive care units (ICUs) at Beth Israel Deaconess Medical Center in Boston, Massachusetts, between 2008 and 2019.15 ,16 The study was approved by the institutional review board at Boise State University. Eligible patients were aged 18 years or older, had a length of stay in the ICU of 48 hours or more, and had at least 1 complete set of vital signs recorded within 48 hours of admission. For patients with multiple ICU visits, data were obtained from the most proximal (last) stay.
The pressure injury outcome variable was defined as a HAPI of any stage (1–4, deep tissue injury, unstageable).1 Stage 1 HAPIs were included because studies show that these injuries frequently progress to more severe stages and are therefore considered clinically relevant.17 ,18
Potential predictor variables, outlined in Table 1, were identified using appropriate conceptual frameworks,19 ,20 other relevant literature,2 ,21 and extraction (with varying levels of difficulty) from the electronic health record (EHR).22 Data were limited to structured fields likely to be present in all EHRs, regardless of vendor, to enhance the ease of future implementation. Also, data for potential predictor variables were limited to values obtained within the first 48 hours to enhance the model’s utility for early identification of HAPI risk and, eventually, earlier preventive intervention.
Data Formatting and Feature Creation
Feature (variable) selection is essential to machine-learning methodology to avoid common pitfalls, such as overfitting, contamination, and target leakage.23 We prioritized data quality to ensure accurate features. The MIMIC-IV database’s characteristics already ensure some aspects of data quality, such as conformance.24 As MIMIC-IV is a relational database, it has relational conformance (the presence of a primary key to tie together the data). Additionally, it has value conformance, evidenced by its data dictionary syntactic and structural constraints.24 When quality features were not explicit, we investigated data quality. When we encountered missing data, such as arterial blood gas (ABG) values, domain experts determined their relevance based on clinical expectations. For instance, missing ABG values were considered contextually appropriate for patients whose conditions did not necessitate routine ABG testing. In terms of data plausibility, we rigorously reviewed the distributions of variables. Implausible biological values led to a detailed examination of the data source. For example, encountering a pH value of 5.0 in the ABG data prompted a review of the specimen source column, which revealed that these were urine pH values mistakenly included. Consequently, such implausible values were excluded from the analysis.
Values for predictive features were restricted to biologically realistic ranges and were otherwise set to missing; for example, a pH value of 5.0 would be set to missing. Simple feature engineering was applied to create the analytic dataset from values measured within the first 48 hours of the last visit, such as minimum oxygen saturation and minimum and maximum heart rate. Indicator variables were created to identify whether an observation was available for a patient in the first 48 hours. A case was included in the dataset for prediction if it had at least 1 complete set of vital signs, defined as systolic and diastolic blood pressure and heart rate observed in the first 48 hours.
Synthetic (exemplar) training and test sets were generated using the R synthpop package to protect data privacy while providing complete and examinable cases to the explainer dashboards. Figure 1 shows the study data schematic.
Prediction
The H2O.ai25 machine-learning platform provides parallelized implementations of many supervised and unsupervised machine-learning algorithms, which can be implemented within open-source packages such as Python and R. In addition to training and testing candidate models ([1] deep neural network, [2] gradient-boosted classification trees, [3] lasso [least absolute shrinkage and selection operator; logistic regression with regularization parameter α = 1], and [4] random forest), we leveraged the Automatic Machine Learning (AutoML) (h2o.automl()) function via cross-validation to automate the supervised machine-learning model training process. AutoML automates the process of finding the optimal model by evaluating various algorithms via cross-validation. The result is an “H2OAutoML” object, which contains a leaderboard of all the models trained in the process, ranked by a default model performance metric. The leading model is often an ensemble “super learner,” a weighted composite of candidate models.26 We also ran the AutoML algorithm restricted to tree-based (eg, XGBoost and random forest) models to ensure the ability to interface with explainer dashboards. The top-performing single model is often referred to as a “discrete super learner.” Ensemble super learners, which are weighted composite models derived from candidate learners, were also created but often prove difficult to pass to explainer tools without modification. All models were evaluated using cross-validation on training data (metrics computed for combined holdout predictions).
Global and Local Model Explanations
Average feature importance was calculated for all candidate models and the super learner model. All models, along with the synthetic (exemplar) data, were then passed to an explainer dashboard (via R DALEX27 and modelStudio28 packages) to better understand global and local (individual patient–level) predictions. Data formatting, machine-learning predictions, and evaluations were performed using R, version 4.3.1.
An Open Science Framework page was created to enhance transparency and reproducibility and can be accessed at https://tinyurl.com/2c3hda7s. The Open Science Framework page includes a data schematic, all study code (“compile_data.RMD” and “superlearner.RMD”), the synthetic (exemplar) dataset, and model performance summaries.
Results
Sample
A total of 28 395 patients met inclusion criteria (age > 18 years, length of ICU stay >48 hours, and at least 1 complete set of vital signs recorded within 48 hours of ICU admission) and were included in the sample. Most patients were White (n = 18 425, 64.9%), and slightly more than half were male (n = 15 881, 56%). The mean (SD) age was 63 (18) years. Characteristics of the sample are shown in Table 1.
Pressure Injury Outcome
A total of 1947 HAPIs were recorded among 1395 patients, for a HAPI incidence of 4.9%. The most common HAPI stage was stage 2 (n = 637, 33%), followed by deep tissue injury (n = 528, 27%), stage 1 (n = 453, 23%), unstageable (n = 301, 15%), stage 3 (n = 44, 2%), and stage 4 (n = 11, 1%). The most common anatomical location was the sacrococcygeal area (n = 740, 38%), followed by the heel (n = 428, 22%) and buttock (n = 312, 16%).
Predictive Models
The prediction probabilities versus actual event status are shown for each candidate model as receiver operating characteristic curves, with summary area under the curve (AUC) scores (Figure 1). All models performed well in cross-validation, with the composite ensemble super learner (from the AutoML ensemble leaderboard) demonstrating the best performance (AUC = 0.80, F1 score = 0.26) (Table 2). The 5 most important variables in the ensemble super learner based on average aggregate base model performance were lowest PaO2, number of medications administered (an indirect measure of the intensity of care), number of prior ICU admissions, lowest serum albumin level, and age (Figure 2).
Interactive Dashboard
R’s modelStudio28 package was used to create comprehensive explainer dashboards. Synthetic data were used in the dashboards to ensure patient privacy. These dashboards provide several interactive plots for global (feature importance, partial dependence, accumulated dependence, and residuals vs features) and local (Shapley Additive Explanations, ceteris paribus, and individual breakdown) (Figure 3, static screenshot).
Discussion
Machine learning was instrumental in the development of a sensitive and specific model for predicting risk of HAPI and a comprehensive dashboard designed to enhance the transparency of AI-generated risk assessments. Using a time-sensitive focus limiting EHR data to the first 48-hour period after ICU admission provided a robust predictive performance to enable early identification of pressure injury risk so that earlier intervention is possible, with potential reduction in HAPI severity and improved patient outcomes. The dashboard’s enhanced transparency improved the capacity for explaining how the model works in composite and which variables were most influential in the model’s decision for a specific patient.
Early identification of HAPI risk is a pressing clinical challenge, because preventive measures are most effective when implemented before skin and tissue damage begins.1 ,29 The Braden Scale is standard in the United States for HAPI risk assessment, but Braden Scale scores obtained near the time of ICU admission are generally poor predictors of subsequent HAPI development.30 For example, a previous study involving 7790 critical care patients indicated that Braden Scale scores obtained within the first 48 hours of ICU admission were only modest predictors of impending pressure injuries (AUC = 0.67). The authors speculated that the Braden Scale may not sufficiently capture risk characteristics in ICU patients, particularly those characteristics that reflect physiological parameters relevant to serious illness or the intensity of clinical care.31
The most important features (variables) in the super learner algorithm, as depicted in Figure 2, were all measures that reflect either the physiological parameters of critical illness or the intensity of care. The finding that Pao2 was the most important risk factor aligns with Coleman and colleagues’ conceptual framework19 and Cox and colleagues’ framework,20 both of which indicate that oxygen delivery to tissues is an essential factor in pressure injury formation. The finding that Pao2 is an important risk characteristic is consistent with a recent study conducted among 23 806 critical care patients.21 Decreased Pao2 may also be indicative of disease states known to be associated with increased risk for HAPI formation due in part to reduced oxygen delivery, including sepsis and COVID-19.10 ,32
Explainable AI dashboards may help nurses understand model functions, offering insights into overall performance and individual patient decisions.
The study’s use of a super learner algorithm is a strength, as it creates a weighted combination of many candidate models, each of which processes data in unique ways.26 The composite nature of the super learner algorithm ensures that it capitalizes on the strengths of each candidate model while compensating for their individual weaknesses. By amalgamating the insights from diverse models, the super learner offers a composite view, often unattainable by any single model. This composite approach enhances the model’s generalizability across varied clinical scenarios and fortifies its resilience against potential outliers or unforeseen data patterns.
The super learner algorithm’s cross-validation (and composite) approach significantly enhances its predictive abilities; however, the utility of any AI-based HAPI risk–assessment model in clinical settings hinges on its transparency. Nurses will not, and should not, trust a model they do not understand. This study introduced a dashboard that provides insight into patient-specific predictions to address the need for transparency, given that many AI models function as “black boxes” without revealing their decision-making logic. The dashboard offered a critical mechanism for demystifying these processes. Although AI models are powerful, they may not always reflect the nuanced clinical understanding that nurses have. For example, the super learner algorithm might identify a patient as “low risk” for HAPI; yet, upon examining the dashboard, a nurse might question this assessment, focusing on the factors the AI deemed significant. The nurse’s judgment remains critical. The AI model is a beneficial tool when it is used as an adjunct to care that optimizes the application of the deep clinical expertise of nursing professionals. Our model’s dashboard is intended to support and inform clinical decisions, not dictate them.
Limitations
This study has several limitations, including its single-site design, which raises concerns about its broader applicability. The MIMIC-IV dataset, which includes EHR data only up to 2019, contains no COVID-19 patients. The lack of patients with COVID-19 is a notable drawback, as those patients have a heightened risk for HAPI and present distinct risk factors.33 The dataset’s time frame also poses challenges regarding HAPI definitions. In 2016, the National Pressure Injury Advisory Panel, previously known as the National Pressure Ulcer Advisory Panel, revised the pressure injury definition and its classification system.34 Given that the MIMIC data covers 2008 to 2019, HAPI classifications encompass both the old and new definitions, which are similar but not the same. Finally, all real-world EHR data convey biases that are inherited and become part of algorithms developed from these data. For instance, EHR data often underrepresent uninsured patients, and past research indicates that AI models tend to produce less accurate recommendations for patients from groups that are underrepresented in the data used to train the model.35
Conclusion
This study introduced a novel XAI approach for early assessment of pressure injury risk in the ICU, complemented by an explainer dashboard to clarify the decision-making processes. This approach provides clinicians with an improved understanding of predictive outcomes. Although the promise of XAI in enhancing clinical decision-making is evident, future research is needed to determine the practicality of integration and effectiveness in real-world clinical settings.
REFERENCES
Footnotes
FINANCIAL DISCLOSURES
This study was funded by the American Association of Critical-Care Nurses 2022 Impact Award.
SEE ALSO
For more about artifical intelligence, visit the Critical Care Nurse website, www.ccnonline.org, and read the guest editorial by Alderden and Johnny, “Artificial Intelligence and the Critical Care Nurse” (October 2023).
To purchase electronic or print reprints, contact American Association of Critical-Care Nurses, 27071 Aliso Creek Road, Aliso Viejo, CA 92656. Phone, (800) 899-1712 or (949) 362-2050 (ext 532); fax, (949) 362-2049; email, [email protected].