Intensive care units frequently use the Glasgow Coma Scale to objectively assess patients’ levels of consciousness. Interobserver reliability of Glasgow Coma Scale scores is critical in determining the degree of impairment.
To evaluate interobserver reliability of intensive care unit patients’ Glasgow Coma Scale scores. Methods This prospective observational study evaluated Glasgow Coma Scale scoring agreement among 21 intensive care unit nurses and 2 independent researchers who assessed 202 patients with neurosurgical or neurological diseases. Each assessment was completed independently and within 1 minute. Participants had no knowledge of the others’ assessments.
Agreement between Glasgow Coma Scale component and sum scores recorded by the 2 researchers ranged from 89.5% to 95.9% (P = .001). Significant agreement among nurses and the 2 researchers was found for eye response (73.8%), motor response (75.0%), verbal response (68.1%), and sum scores (62.4%) (all P = .001). Significant agreement among nurses and the 2 researchers (55.2%) was also found for sum scores of patients with sum scores of 10 or less (P = .03).
Although the study showed near-perfect agreement between the 2 researchers’ Glasgow Coma Scale scores, agreement among nurses and the 2 researchers was moderate (not near perfect) for subcomponent and sum scores. Accurate Glasgow Coma Scale evaluation requires that intensive care unit nurses have adequate knowledge and skills. Educational strategies such as simulations or orientation practice with a preceptor nurse can help develop such skills.
For the last 40 years, the Glasgow Coma Scale (GCS) has been used worldwide, especially in emergency care and intensive care units (ICUs), to assess patients’ levels of consciousness.1 The GCS consists of 3 subscales: eye response (scored 1-4), motor response (scored 1-6), and verbal response (scored 1-5). The GCS sum score determines the patient’s level of consciousness.2 Because it indicates patient outcome, it plays a significant role in efficient and accurate patient assessment and is essential in planning best treatment modalities and patient care.3,4 However, many factors influence the reliability of GCS evaluations.3
The GCS is a validated and universal scale used to evaluate patients’ level of consciousness. Variability in interobserver reliability results, however, have led to doubts about the success of the scale.5,6 Previous studies have assessed the interobserver reliability of GCS scores among health care providers.3-5,7,8 A video-based study reported GCS score accuracy rates of 29% and 33.1% among nurses and other health care providers, respectively. The study also found that nurses had the lowest GCS scoring accuracy rate.3 A randomized controlled trial determined that GCS sum scores were statistically more accurate when health care providers evaluated scores while holding a GCS reference card.4 In some studies, emergency department health care providers scored eye, verbal, and motor responses most accurately.3–5 Gill et al,5 however, found that only 32% of health care providers agreed on the GCS scores of adult patients.
The literature shows that GCS assessment standardization, in addition to knowledge and education, is extremely important in ensuring assessment accuracy.9,10 Most studies of interobserver reliability among health care providers have used simulated patients or taken place in emergency departments. Patients in ICUs require rapid and objective assessment of level of consciousness. The level of consciousness, especially in patients with neurological or neurosurgical diseases, may change within minutes of arrival in the ICU. To prevent irreversible damage in patients, ICU nurses must be able to accurately assess each patient’s level of consciousness so the health care team can make decisions about necessary treatment and interventions without wasting time. An early and accurate assessment of a patient’s GCS score is essential for planning and implementing effective patient care and achieving targeted patient outcomes.6
Many reliability studies of the GCS in critical care nursing conducted in the 1990s found a gap between clinical practice and knowledge and suggested additional training for nurses. However, the wording of the scale was changed in 2014, so reevaluating ICU nurses’ accuracy in GCS evaluation was necessary. Few prospective observational studies of ICU nurses’ accuracy with the revised GCS have been conducted, especially in real patients. Therefore, the aim of this study was to evaluate interobserver reliability of the GCS.
Study Design and Participants
This prospective observational study evaluated GCS scoring agreement among ICU nurses. We designed the study according to the Strengthening the Reporting of Observational Studies in Epidemiology tool.11
The study was conducted from November 2017 through February 2018 in an 18-bed medical-surgical ICU in which 35 nurses work in 2 shifts. We used a convenience sample of 21 nurses from the day shift. Inclusion criteria for nurses were having graduated from nursing school at least 6 months before the study and agreeing to participate. Demographic information included education level, professional experience, academic background, and postgraduate qualifications in critical care.
GCS is a universal scale used to evaluate patients’ level of consciousness; however, studies have raised doubts about the interobserver reliability of GCS.
Patients’ GCS scores were recorded by each patient’s primary nurse and by 2 independent researchers. Both researchers had at least 3 years of experience working as an ICU nurse and had been working as instructors for 8 years. Both researchers had conducted ICU nursing lectures in an undergraduate nursing program and a postgraduate critical care nursing certification program.
Patients participating in the study were selected according to their diagnoses. Patients with altered levels of consciousness were included. Patients included were over 18 years of age, had undergone neurosurgery, or had neurological diseases that led to changes in their GCS scores and admission to the ICU. Patients who received paralytic agents or had a Richmond Agitation-Sedation Scale score of –4 (deep sedation) or –5 (unarousable level of sedation) were excluded.
Participating researchers designed the data collection form used in the study. Participating nurses’ experience levels and patients’ medical diagnoses were recorded. Each patient’s primary ICU nurse and the 2 additional researchers were given 1 minute to independently record the patient’s total GCS score and the eye, verbal, and motor subcomponent scores. Because significant changes in the patient’s GCS score might occur during observations, each observation was not allowed to exceed 5 minutes. The study’s researchers did not train participating nurses in GCS evaluation. Nurses were given GCS scoring reference cards, with real-time use being optional. Some nurses scored patients’ GCS without using the card. This method allowed both nurses who were and those who were not able to memorize the GCS to participate. Eye, motor, and verbal responses, as well as total GCS scores, were recorded after each evaluation. The nurses and 2 researchers returned their individually completed forms while remaining blind to each other’s assessments.
The institutional review board the study institution approved the study (2017.228.IRB2.068), and the hospital approved the study protocol. Written and oral consent was obtained from each registered nurse and patient or, for patients with altered levels of consciousness, the next of kin.
Data were processed with NCSS statistical software (NCSS, LLC). We used the Krippendorff α statistic to assess interrater agreement among researchers and nurses making clinical decisions about the GCS. Krippendorff α values of 0.20 or lower were considered poor agreement; from 0.21 to 0.40, fair agreement; from 0.41 to 0.60, moderate agreement; from 0.61 to 0.80, substantial agreement; and from 0.80 to 1.00, near-perfect agreement. We used descriptive statistics (percentages, means, SDs, and ranges) to determine total GCS and subcomponent scores. P values of less than .01 were considered significant.1
Twenty-one ICU nurses took part in the study. Most had been working in the ICU in which the study was conducted for 1 to 2 years, and almost half had 3 to 5 years of ICU nursing experience. Only 2 nurses (10%) had critical care certification. The 21 ICU nurses and 2 researchers completed a total of 202 GCS observations. The most common patient diagnoses were subarachnoid hemorrhage and brain tumor (Table 1).
In eye response scores, we found significant agreement between ICU nurses and the first researcher (64.1%), between ICU nurses and the second researcher (62.9%), between the 2 researchers (94.6%), and between ICU nurses, the first researcher, and the second researcher (73.8%; Table 2). To evaluate motor response, both researchers used the pain stimulation sites (nail bed, trapezius, and supraorbital site) recommended by Teasdale.1 At the end of each evaluation, nurses specified their preferred sites of pain stimulation. The most preferred sites were nail bed, trapezius, nipple, sole of the foot, earlobe, and supraorbital site. In motor response scores, we found significant agreement between ICU nurses and the first researcher (64.9%), between ICU nurses and the second researcher (66.6%), between the 2 researchers (94.3%), and among all 3 observers (75.0%; Table 2).
In verbal response scores, we found significant agreement between nurses and the first researcher (55.2%), between nurses and the second researcher (56.9%), between the 2 researchers (95.9%), and among all 3 observers (68.1%; Table 2). We also determined that 16.3% of the nurses recorded the verbal responses of intubated patients as “none.”
In GCS sum scores, we found significant agreement between ICU nurses and the first researcher (49.1%), between ICU nurses and the second researcher (49.0%), between the 2 researchers (89.5%), and among all 3 observers (62.4%). We also found moderate agreement among the 3 observers for patients with GCS sum scores of 10 or less (Table 2).
The 2 researchers’ eye, motor, verbal, and sum GCS scores, which were based on evaluation of patients’ clinical manifestations, were in near-perfect agreement. We believe that the factors that enabled this near-perfect level of interrater reliability between the researchers could be used to improve the standardization and accuracy of GCS scoring. These factors include knowledge of current theory and practice, agreement on appropriate sites for pain stimulation, use of the proper amount of time to accurately evaluate the GCS, an understanding of the importance of GCS assessment, and previous experience in using the GCS for patients with altered levels of consciousness.
Although agreement between the 2 researchers was nearly perfect, we found only moderate agreement in GCS sum scores and all subcomponents among all 3 observers (ICU nurses and the 2 researchers). Because interobserver reliability for GCS sum scores and each subcomponent may be affected by different factors, we discuss agreement levels and their rationales separately.
The eye response component of the GCS is limited to the opening or closing of the eye in response to stimuli. Eye response is therefore not considered a reliable indicator of consciousness because arousal does not equate to consciousness.12,13 Holdgate et al6 found intermediate agreement in eye response scores, with lower agreement among less experienced nurses. Heron et al7 found the highest levels of interrater reliability for eye response scores. A study by Jaddoua et al14 found that 55% of nurses in neurosurgical wards did not know that the eye response component ranges from 1 to 4, indicating that nurses have inadequate knowledge of GCS eye response scoring. Similarly, Waterhouse15 reported that 38% of nurses identified eye response accurately. Our study found sufficient (not near-perfect) interobserver agreement in eye response scores among nurses and the 2 researchers, demonstrating that GCS sum scores may be affected by inaccurate eye scoring, which in turn could affect nursing interventions and patient prognosis.
Published studies have reported varying results for motor response assessments. Heron et al7 found the lowest interrater reliability in the motor component, which affected overall score accuracy. Their results are consistent with those of previous studies demonstrating that regardless of the site of stimulation, raters experienced confusion because of localization differences, abnormal flexion, and extension response, all of which are major sources of interrater disagreement.15,16 Teasdale et al16 updated the information about the GCS and recommended stimulating the nail bed, trapezius, and supraorbital site to record the best response. However, varying assessment techniques and inconsistent recording of GCS scores indicate a lack of standardization in clinical practice.17 Because of the use of different locations for pain stimuli, the reliability of the GCS has been questioned.12 Reith et al17 reported that the most common pain stimuli used were nail bed pressure, trapezius pinch, finger pinch, sternal rub, earlobe stimulation, supraorbital nerve pressure, and retromandibular stimulation. In another study, Reith et al18 found that the level of agreement for motor response scores was sufficient and was higher than for other subcomponents of the GCS, similar to the findings of Teasdale et al.16 Middleton19 stated that in clinical practice the most preferred method of pain stimulation is the trapezius pinch. Waterhouse15 indicated that only 47% of nurses used approved methods. by intensive care unit nurses and 2 independent researchers.
We found a moderate level of agreement in GCS sum scores assessed by intensive care unit nurses and 2 independent researchers.
Our study found a substantial level of agreement between ICU nurses and the 2 researchers for motor response scores. The substantial (not near-perfect) level of agreement between the nurses and the researchers may be explained by the use of different or improper sites of pain stimulation. In our study, nurses preferred using the nail bed, trapezius, nipple, sole of the foot, earlobe, and supraorbital site. The researchers used the nail bed, trapezius, and suborbital sites, which are the preferred sites for painful stimuli suggested in the literature.16 The results of this study may also be explained by confusion regarding normal and abnormal flexion responses when using different sites of pain stimulation.
Nurses should receive appropriate education and skills through novel educational strategies such as high-fidelity simulation or objective structured clinical examinations with simulated patients.
Healey et al20 found that motor score accuracy alone is better than total GCS score as a predictor of patient outcome. However, accurate predictions of mortality and patient outcome require a high level of agreement among observers. As previous studies also suggest,7,21 we recommend eliminating confounding variables such as the use of different sites of pain stimuli and the scoring/evaluating of motor response, both of which may compromise nursing care and prediction of patient outcome.
Over the years, health care providers have used different methods of resolving conflicts in the scoring of verbal response. One of the greatest obstacles in the assessment of verbal response is endotracheal intubation.22-26 New methods include assigning the lowest possible score to untestable components and pseudoscoring missing values on the basis of testable features.24 Our study found a substantial (not near-perfect) level of agreement among observers for patients’ verbal responses, possibly because of the challenges associated with assigning scores for intubated patients. Of the participating nurses, 16.8% assigned a score of 1 (none) to the verbal component for intubated patients, suggesting that nurses require further training in assessing this component for such patients. Similarly, Reith et al17 found that 31% of nurses assigned a score of 1 to the verbal component for intubated patients.
A score of 1 is typically used to indicate an absence of response.17 However, there is a significant difference between a valid score of 1 and a score of 1 assigned to an untestable patient.1 Assigning a score of 1 to the verbal component may result in a lower GCS sum score, leading to poor patient outcomes.27 Therefore, Teasdale et al16 recommended assigning a value of V(tube) for verbal response in intubated patients or patients with a trache-ostomy. They also discouraged assigning a score of 1 for verbal response in sedated and untestable patients. However, assigning a nonnumeric value for the verbal response of intubated patients may lead to inaccurate GCS sum scores. The resulting unreliable GCS scores may affect prognostication, treatment, and decision-making by health care providers.25 In our study, we advised observers to evaluate nonsedated, intubated patients’ verbal responses by having patients write or point to letters if they were able to obey commands. Health care providers need precise information to assign accurate values to the untestable verbal components for patients receiving mechanical ventilation. We also recommend discontinuing the use of the V(tube) assignment for intubated patients.
Glasgow Sum Scores
To predict patient mortality and implement effective care intervention, each subcomponent score of the GCS should indicate an objective and accurate result concerning a patient’s level of consciousness. The literature shows that significant correlation exists between a low GCS sum score and high mortality/poor prognosis.12 Healey et al20 identified 120 possible motor, verbal, and eye response combinations and noted that these scores combine into only 13 different GCS sum scores (scores of 3 through 15), which are associated with different mortality rates. The combination of motor score of 1, verbal score of 2, and eye response score of 1 is associated with a 28% mortality rate, whereas the combination of motor score of 2, verbal score of 1, and eye response score of 1 is associated with a 52% mortality rate. Studies have shown that a patient’s consciousness level is a predicting factor in the agreement of GCS scores.4,6,17,27-30
Our study found a borderline moderate level of agreement (62.4%) in GCS sum scores assessed by ICU nurses and the 2 independent researchers. Many studies have found excellent interrater reliability rates for patients with GCS scores of 13 to 15.31-34 We found a moderate level of agreement in GCS scores for patients with GCS scores of 10 or less. Similarly, other studies have shown moderate or fair agreement for patients with GCS scores lower than 13.21,33 Although such differences may be considered insignificant, their clinical importance depends on patient diagnosis, outcome, and severity of illness. These differences also affect patient triage and treatment decisions.2
Interobserver agreement for GCS scores has been reported to range from high3,6,32 to low5 among various types of health care providers. The factors that may have the most influence on GCS score reliability are level of experience,30,35-37 especially with neurosurgical patients,6,7,16,21,30,38 and educational qualifications of nurses.39,40 Studies have shown that experienced nurses are more accurate in GCS assessment than are younger and more inexperienced nurses because the latter may have inadequate knowledge of the GCS.6,10,14,26,41 Our study did not measure whether level of education or experience affects GCS assessment; however, most nurses had 1 to 2 years of experience in the ICU in which the study was conducted, and almost half of the nurses had 3 to 5 years of experience in the ICU nursing profession. Considering that most nurses in our study had only a few years of professional experience, the moderate level of agreement shows that less experienced nurses may face difficulties in evaluating patients with GCS deterioration.
Although agreement between the 2 researchers in our study was near perfect, the level of agreement among nurses and the 2 researchers was substantial. Intensive care unit nurses may lack theoretical knowledge and skills in assessing the GCS. The 2 researchers’ high levels of experience and knowledge are likely reasons for their almost-perfect level of agreement.
Our study included only patients with neurological and neurosurgical diseases. Most patients’ GCS scores were below 10 (n = 146); we believe that pseudoscoring was prevented because the assessment is easier in patients with GCS scores greater than 10. Another strength of this study is that each observer evaluated patients’ GCS scores independently and within the designated time frame. In our study, each observation was not allowed to exceed 5 minutes, preventing possible changes in patients’ GCS scores. We believe that this study provided ICU nurses with constructive feedback and improved awareness and knowledge regarding the evaluation of motor response.
One limitation of this study was its use of convenience sampling with patient throughput. Because the study was conducted with 21 nurses working in only one 18-bed ICU, results should be cautiously interpreted. During data collection, nurses expressed doubts concerning their motor response assessments even though they were given the option of real-time use of a GCS reference card. Because they were aware that they were taking part in a study, nurses may have reviewed GCS assessment guidelines during data collection, which may have led to an improved level of agreement in motor response scores among the nurses and researchers. Another limitation is that we could not evaluate the link between GCS score accuracy and nurses’ levels of knowledge and years of experience.
Implications for Practice
Research indicates that standardization of assessment, in addition to knowledge and education, is extremely important in ensuring accuracy of assessment.9,10,42 We found, however, that lack of knowledge and education regarding the standardized use of the GCS is still an issue for ICUs. To improve GCS evaluation, nurses should receive appropriate education and skills through novel educational strategies such as high-fidelity simulation or objective structured clinical examinations with simulated patients. Neurocritical care scenarios such as treating patients with head trauma or progressively deteriorating acute intracerebral hemorrhage create a learning environment that may lead to more reliable GCS assessment. Because of the variety of patients requiring GCS assessment, refresher training programs and practical workshops on neurological assessment should be organized regularly for more experienced nurses. Such training programs may help prevent inaccuracies in GCS assessment. Furthermore, frequent exposure to patients with a wide variety of neurosurgical or neurological diseases who require GCS assessment may facilitate effective training and standardization of assessment. More time spent on bedside GCS scoring in the presence of a preceptor nurse during the first 6 months in the ICU might raise nurse awareness regarding the importance of GCS evaluation and its effect on patient prognosis.
We argue that more consistent use of the GCS can be achieved in both research and daily practice. To establish a high level of interrater reliability in GCS scoring, particularly in the sum and verbal scores in ICU settings, further blinded studies should focus on standardized methods. We strongly advise ICU nurses to place importance not only on knowledge and education but also on skills acquired by novel educational strategies. We also advise nurses to use GCS reference cards during GCS scoring rather than relying on memory alone.
To purchase electronic or print reprints, contact the American Association of Critical-Care Nurses, 27071 Aliso Creek Rd, Aliso Viejo, CA 92656. Phone, (800) 899-1712 or (949) 362-2050 (ext 532); fax, (949) 362-2049; email, firstname.lastname@example.org.
To learn more about caring for sedated patients, read “Stimulation of Critically Ill Patients: Relationship to Sedation” by Grap et al in the American Journal of Critical Care, 2016;25(3):e48-e55. Available at www.ajcconline.org.