Pressure injuries are an important problem in hospital care. Detecting the population at risk for pressure injuries is the first step in any preventive strategy. Available tools such as the Norton and Braden scales do not take into account all of the relevant risk factors. Data mining and machine learning techniques have the potential to overcome this limitation.
To build a model to detect pressure injury risk in intensive care unit patients and to put the model into production in a real environment.
The sample comprised adult patients admitted to an intensive care unit (N = 6694) at University Hospital of Torrevieja and University Hospital of Vinalopó. A retrospective design was used to train (n = 2508) and test (n = 1769) the model and then a prospective design was used to test the model in a real environment (n = 2417). Data mining was used to extract variables from electronic medical records and a predictive model was built with machine learning techniques. The sensitivity, specificity, area under the curve, and accuracy of the model were evaluated.
The final model used logistic regression and incorporated 23 variables. The model had sensitivity of 0.90, specificity of 0.74, and area under the curve of 0.89 during the initial test, and thus it outperformed the Norton scale. The model performed well 1 year later in a real environment.
The model effectively predicts risk of pressure injury. This allows nurses to focus on patients at high risk for pressure injury without increasing workload.
Pressure injuries (PIs) are localized injuries of the skin or underlying tissue, usually over a bony prominence, that result from pressure or pressure in combination with shear.1 Most PIs are avoidable,2,3 and thus PIs represent a problem in the quality of health care. These injuries can have a profound impact on patients, their familes, professionals, and institutions. Pressure injuries develop in 0.3% to 20% of hospitalized patients4,5 and in 3.3% to 53.4% of patients in intensive care units (ICUs).6,7 More PIs occur in patients in intensive care units than in hospital patients overall because of the greater vulnerability of patients in intensive care units. The cost of hospital-acquired PIs in the United States could exceed $26.8 billion annually.8
The first step in any strategy to prevent PIs is to detect the population at risk for PIs. Tools have been developed to detect PI risk in patients, including the Norton, Braden, and Waterlow scales.9–11 These scales take into account basic dimensions to detect PI; however, they fail to address some variables that have been identified as risk factors for PIs, including hematological values,12–14 oxygenation and perfusion,15 and the presence of diabetes11 or vascular disease.16 In our context, we used the Norton scale, which is implemented by nurses and based on observations and interviews within the first 24 hours after admission or after a significant change in health state. The Norton scale measures 5 variables: type of activity, physical condition, mental state, type of incontinence, and mobility type.11
Electronic medical records (EMR) facilitate comparison and analysis of the characteristics of patients in whom PIs develop. Data mining and machine learning techniques can reveal complex and meaningful patterns in the large volume of data contained in EMRs, and may allow us to predict future events such as the development of a PI. Researchers in the health sciences have used data mining and machine learning extensively, but few have applied the techniques to the field of nursing. Some researchers have used data mining or machine learning to build models to study risk for PIs,17–24 but few of those models have progressed to production (ie, availability for real-time use),25,26 which is a common problem in any field where machine learning techniques are applied.27 Data mining and machine learning models can automatically integrate and analyze the characteristics of each individual case, which makes it easier to manage the risk of PIs in individual patients in real time. In addition, machine learning systems can continuously learn as new cases emerge and thus adapt a model to new situations.28
Traditional scales fail to address some variables that have been identified as risk factors for PIs.
We believe that the application of data mining and machine learning techniques can complement and improve upon the predictive power of the Nor-ton risk assessment scale. The resulting model could help nurses to improve the care that patients receive throughout their hospital stay. In the present study, we built a model to detect PI risk in patients admitted to an ICU and put the model into production in a real environment.
Methods
Design
The study was divided into 2 phases: In the first phase, we used a retrospective design to train and test the model; in the second phase, we used a sequential prospective design to test the model in a real environment (Figure 1). We followed the cross-industry standard process for data mining (CRISP-DM) to develop the predictive model24 and subsequently apply it in a real environment. The CRISP-DM is widely used in many data mining and machine learning studies and is a comprehensive method and process model that breaks down the life cycle of a data mining project into 6 phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.29
Setting and Population
The research took place at 2 university hospitals in Spain. Both are public health centers; University Hospital of Vinalopó has 230 beds including a medical and surgical ICU with 16 beds, and the University Hospital of Torrevieja has 277 beds including a medical and surgical ICU with 15 beds. The study population comprised adult patients admitted at least once to the ICU, during their ICU stay and their subsequent acute hospitalization, if any.
Sample
The total sample (N = 6694) comprised all adult patients admitted to the ICU during their hospital stay from January 1, 2016, through September 30, 2018. Patients admitted for less than 72 hours, patients under 16 years of age, and obstetric patients were excluded. The sample for the first phase (n = 4277) comprised patients admitted January 1, 2016, through September 30, 2017, and the sample for the second phase (n = 2417) comprised patients admitted from October 1, 2017, through September 30, 2018. During the first phase, the sample was divided into 2 subsamples, one for training the model (n = 2508) and another for testing the model (n = 1769). In the second phase, the entire sample was used to test the model. The homogeneity of variance and the relation of the subsamples used for model training and testing were checked with the χ2 independence test and the Student t test (Table 1).
Data mining was used to extract variables from electronic medical records, and a predictive model was built with machine learning techniques.
Data Preparation
The group of cases included patients whose EMR showed a “hospital-acquired PI type wound” in the Wound Tracking Form during their stay in ICU or their subsequent acute hospitalization stay during the same admission. If a patient had more than 1 PI develop, only the first one was included in our analysis. Stage 1 through 4 PIs and unstageable PIs were included.30 The number of PIs identified in this way was smaller than we expected in light of existing literature on the incidence of PI in hospitalized patients.6,7 Therefore, we recovered cases of PIs that were not designated as “hospital-acquired PI type wound” on the EMR by using a search algorithm to locate free text records of concepts related to PI treatments in the nursing records. One reviewer checked and confirmed the cases located by the algorithm. A corrective action was taken in the EMR to register the recovered cases of PI on the Wound Tracking Form and thus incorporate them into the group of patients with PIs. The cases identified with the recovery algorithm accounted for 47% of the total cases in the first phase.
On the basis of a literature review,13,23 we identified 93 variables as possible predictors of PIs. We then evaluated our ability to obtain data on these variables. Out of the 93 variables, we discarded 26 because of a high number of missing values or inability to recover the data from the EMRs. We selected the remaining 67 variables (listed in a Supplement to this article) to train and test the model and we determined criteria for extracting the variables from the EMRs.
The data were processed to generate an initial database that we used to select the algorithm. The variables were categorized according to the nature of the data and previously defined ranges and were grouped into the following domains (some previously used by Coleman et al13 ): activity/mobility, age, care process, gender, general health status, hematological measures, medication, mental status, nutrition, moisture, place of birth, scales of risk, skin status, and surgical intervention. The diseases described in the abbreviated Charlson comorbidity index31 were included in the study in the domain of general health status. The diagnoses were codified per the International Classification of Diseases, Ninth Revision (ICD-9) for 2016 and the Tenth Revision (ICD-10) for 2017. The Supplement to this article shows the timing of data collection from the EMR for each variable. Missing values were processed and the sample was normalized. Different techniques for handling missing values were applied depending on the characteristics of the variable, mainly averaging techniques to substitute means for missing values of continuous variables and to substitute modes for missing values of categorical variables. Although this decreases the variance in the data set, it was the most feasible approach to handle the missing data because it provided the best accuracy for the effort required.
To select a machine learning algorithm, we compared the performance of 9 classification algorithms available in Microsoft Azure Machine Learning (Table 2) with the testing subsample of phase 1 and the 67 variables. We calculated the sensitivity (effectiveness of the algorithm on a positive class); specificity (effectiveness of the algorithm on a negative class); accuracy (overall effectiveness of the algorithm); and area under the receiver operating characteristic curve, which shows the relationship between the sensitivity and the specificity of the algorithm. After we selected the algorithm with the best measures, we identified the most significant variables and performed data cleansing. We used the synthetic minority oversampling technique to balance the classification of patient groups with and without PIs. In the second phase, during which the model was put into production and tested in a real environment, the data preparation process was carried out independently by the model itself.
The model integrated into the electronic medical record allows nurses to identify risk of pressure injuries occurring objectively and accurately.
Data Analysis and Machine Learning
In the first phase, after the algorithm was selected, we used permutation functions to calculate the individual contribution of each of the 67 variables to the discriminative capacity of the model (see the Supplement to this article). We performed these calculations to elucidate the relationship between the independent variables and the dependent one within the model.32 We then used these values to eliminate variables with low or no results and repeated the process (calculate-eliminate) iteratively to improve the metrics and achieve greater result accuracy and content validity.
To determine whether the model represented an improvement over standard practice in the field, we compared the results obtained by the selected algorithm (a risk exists from 0.50 or higher) and those obtained by the Norton scale (a risk exists at 15 points or less) on the testing sample from the first phase for sensitivity, specificity, area under the curve (AUC), positive predictive value, negative predictive value, and accuracy and their 95% confidence intervals. We calculated the χ2 and Student t test values for each variable in the test sample in phase 1 to explore whether the variables that were used by the model in the group of patients in whom PIs developed differed from the variables used in the group that did not. In phase 2, the same measures (sensitivity, specificity, etc) were calculated as the model was used on patient data obtained in the real environment after the model was put into production.
The cloud platform of Microsoft Azure and R software (v. 3.4.2) were used for statistical analysis during the development of the project.
Ethical Considerations
This study was approved by the Research Committee (University Hospital of Torrevieja and University Hospital of Vinalopó). Patients’ data were anonymized.
Results
Description of the Sample
The total sample consisted of 6694 patients, and the accumulated incidence rate of patients with PIs was 4.12% or an incidence of 4.25 patients with PIs per 1000 days of stay in the hospital. The accumulated incidence rate of PIs that developed in the ICU was 2.83% or 3.10 per 1000 days of stay. The sample had more men (66%) than women, and the age range with the highest percentage of patients was 65 to 84 years.
We compared the sample used for testing versus the training sample. The subsamples used for initial testing (phase 1; n = 1769) and testing in a real environment (phase 2; n = 2417) were similar to the training sample (phase 1; n = 2508) in all characteristics except for the place of birth (phase 1, P = .03; phase 2, P = .02) and hemoglobin level (phase 2, P = .001). These differences were statistically significant but they are not clinically significant (Table 1).
Phase 1 Data Mining and Machine Learning
To select a machine learning algorithm, we compared performance metrics of 9 machine learning algorithms in predicting PI incidence in the testing data set (Table 2). Out of these 9 algorithms, we selected logistic regression because it had the highest AUC (0.71) and the second-highest sensitivity (0.91).
Twenty-three variables were definitively part of the model. The importance of each variable is shown by the size of the horizontal bars in Figure 2. The variables that contributed most to the discriminative capacity of the model were medical service, days of oral antidiabetic agent or insulin therapy, ability to eat (Barthel scale), number of red blood cell units transfused, hemoglobin range, PI present on admission, and illness severity (APACHE [Acute Physiology and Chronic Health Evaluation] II score). In general, patients who had a PI develop were more likely be in the ICU (P = .03) and had been treated for more days with an oral antidiabetic agent or insulin (P < .001), were less able to eat independently (P < .001), had undergone transfusion of more red blood cell units (P < .001), were more likely to have a low hemoglobin level (P < .001), were less likely to have had a PI at admission (P = .09), and had higher APACHE II scores (P < .001) than did patients who did not have PIs develop.
The receiver operating characteristic curve produced by the logistic regression model in phase 1 is shown in Figure 3A. Data from phase 1 show that the logistic regression model performed better than the Norton scale in sensitivity (0.90 vs 0.85), specificity (0.74 vs 0.64), AUC (0.89 vs 0.75), positive predictive value (11.98% vs 8.76%), negative predictive value (99.44% vs 99.09%) and accuracy (0.74 vs 0.65). The CIs for specificity, AUC, and accuracy from phase 1 do not overlap (Table 3).
Phase 2 Data Mining and Machine Learning
The receiver operating characteristic curve produced by the logistic regression model in phase 2 is shown in Figure 3B. The results obtained by applying the Norton scale and the logistic regression model to the test sample in phase 2 confirm that the model outperformed the Norton scale in specificity (0.88 vs 0.67), AUC (0.88 vs 0.77), positive predictive value (21.95% vs 10.87%), and accuracy (0.87 vs 0.68) but had lower values for sensitivity (0.75 vs 0.87) and negative predictive value (98.68% vs 99.10%). The CIs for specificity, AUC, positive predictive value, and accuracy from phase 2 do not overlap (Table 3). Overall, these data demonstrate that the logistic regression model has high discriminative capacity.
Discussion
We used data mining and machine learning techniques to construct a model to detect PI risk in patients admitted to an ICU and put the model into production in a real environment. Our sample of 6694 patients had an accumulated incidence rate of PI of 4.12% and a rate of 2.83% for PIs that developed while the patients were in the ICU. This incidence rate is slightly lower than incidence rates reported in previous studies,6,7 which range from 3.3% to 53.4%. The main reasons for these differences could be the type of ICU (medical-surgical), the median stay (7 days), and possibly methodological differences across studies.
The model, a logistic regression algorithm, consisted of 23 variables. The 7 variables that most contributed to the model were as follows:
Medical service (care process domain)
Days of oral antidiabetic agent or insulin treatment (medication domain)
Ability to eat, Barthel scale (activity/mobility domain)
Number of red blood cell units transfused (hematological measures domain)
Hemoglobin range (hematological measures domain)
PI present on admission (skin status domain)
Illness severity, APACHE II (general health status domain)
All domains included in the model (under the same name or similar) had been identified as significant in previous studies.13,33
Regarding the characteristics of patients who had a PI develop, every variable denotes their vulnerability, with the exception of the variable “PI present on admission,” which was not statistically significant. This could mean that patients who have a PI on admission could be receiving specific nurse interventions (regardless of the risk score), which could mask the relationship being studied.
In both phase 1 and phase 2, performance metrics showed that the logistic regression model was better at detecting risk of PI than the Norton scale was for every statistic except for sensitivity in phase 2. These data suggest that the discriminative capacity of the logistic regression model is better than that of the Norton scale alone. The results of our model compare favorably with results from scale evaluation studies10,34 and predictive models.18,19,23,24 We found an example of an AUC similar to that of our model (0.90 vs 0.89) in a Braden scale meta-analysis,8 but we did not find better results for other measures (sensitivity, specificity, and accuracy) in any study. Thus it appears that the logistic regression model produces a better overall result than other methods. Furthermore, these positive outcomes continued after the model was put into production and tested with a sample in a real environment in phase 2.
This study has some limitations. First, some reported risk factors, such as body temperature,35,36 could not be included in the model because of excessive missing values or inability to extract the data from the EMR. Second, although we did recover a significant number of PIs from the EMRs with an algorithm that searched free text records, we cannot ensure that all PIs that developed during the period of the study were accounted for (ie, the number of PI cases may have been underreported). Third, the built model is a “black box”32 ; we cannot clearly see how each variable affects the risk of PI development. Fourth, PI prevention interventions provided by nurses were not considered in this study because this variable is not accurately documented in the EMR. And fifth, although the sample size was significant, the usefulness of the predictive model to other hospital centers is unknown because the model is dependent on the data that feed it (although we expect that the variables included in the model could be extracted from EMRs in other settings).
The model has been put into production in a real environment and integrated into the EMR, and it allows nurses to identify risk of PI incidence objectively and accurately from admission to discharge, because it provides an automatic and continuous prediction based on real-time clinical data. Unlike other risk scales, the model recognizes changes in the patient’s condition over time. This helps caregivers focus on preventative care for the patients who need it most, without burdening nurses with the need to gather new information.
Conclusion
The model, developed using data mining and machine learning techniques, offers very good results and provides greater predictive power than the Nor-ton scale alone, or other models, in our context. Integrating these models into their usual practice will make it easier for hospitals to direct preventive care toward patients who need it most without unnecessarily increasing the workload of care providers. Important challenges that remain to be addressed include evaluating the model’s results just for the period that patients stay in the ICU, and validating the model in other hospital settings.
Supplementary Material
REFERENCES
Footnotes
FINANCIAL DISCLOSURES
None reported.
SEE ALSO
For more about pressure injuries, visit the Critical Care Nurse website, www.ccnonline.org, and read the article by McGee et al, “Pressure Injuries at Intensive Care Unit Admission as a Prognostic Indicator of Patient Outcomes” (June 2019).
To purchase electronic or print reprints, contact American Association of Critical-Care Nurses, 27071 Aliso Creek Road, Aliso Viejo, CA 92656. Phone, (800) 899-1712 or (949) 362-2050 (ext 532); fax, (949) 362-2049; email, [email protected].