Examination of the New ICD-11 Prolonged Grief Disorder Guidelines Across Five International Samples

Background Prolonged grief disorder (PGD) is a new disorder included in the 11th edition of the International classification of diseases (ICD-11). An important remit of the new ICD-11 is the global applicability of the mental health disorder guidelines or definitions. Although previous definitions and descriptions of disordered grief have been assessed worldwide, this new definition has not yet been systematically validated. Method Here we assess the validity and applicability of core items of the ICD-11 PGD across five international samples of bereaved persons from Switzerland (N = 214), China (N = 325); Israel (N = 544), Portugal (N = 218) and Ireland (N = 830). Results The results confirm that variation in the diagnostic algorithm for PGD can greatly impact the rates of disorder within and between international samples. Different predictors of PGD severity may be related to sample differences. Finally, a threshold for diagnosis of clinically relevant PGD symptoms using a new scale, the International Prolonged Grief Disorder Scale (IPGDS), in three samples was confirmed. Conclusions Although this study was limited by lack of questionnaire data points across all five samples, the findings for the diagnostic threshold and algorithm iterations have implications for clinical use of the new ICD-11 PGD criteria worldwide.

In 2019 prolonged grief disorder (PGD) was included in the International Classification of Diseases (ICD-11) for the first time. The diagnostic criteria for a disorder of grief have a long history and there are several previous definitions and iterations (Prigerson et al., 2009;Shear, 2015;Wagner & Maercker, 2010). The current definition represents a new focus of the World Health Organization (WHO) on the clinical utility and global applica bility of the disorder (Maercker et al., 2013). The rationale for the updated iteration in the new ICD-11 definition was to standardize this diagnosis internationally, however, the validity of the diagnostic criteria across different international samples has yet to be established. In this brief report, we test, for the first time, the core items of the PGD ICD-11 criteria in five international datasets.
The WHO working groups for the ICD-11 adopted a two-phase strategy to update disorder definitions. The first phase involved developing the structure of the definition based on a large international survey of psychologists and psychiatrists (Evans et al., 2013;Reed, Correia, Esparza, Saxena, & Maj, 2011). They called for flexible diagnostic guidelines, recognition of cultural factors, and fewer disorder categories with no sub types. The resulting PGD definition included two core symptoms (intense yearning or preoccupation with the deceased), examples of emotional pain (i.e anger, sadness, guilt), at least 6 months duration since loss, and an impairment criterion. For a full description see Killikelly and Maercker (2017). Importantly, the working group also included a cultural caveat whereby symptoms of grief must exceed expected socio-cultural norms. The second phase in the WHO's research approach was to evaluate the usability (clinical utility) of these guidelines in diagnostic decision making. Recent field studies have been conducted to explore the clinical utility and validity of PGD through clinicians' assessments of vignettes (Keeley et al., 2016;Reed et al., 2018) and proposals for further evaluation (Gureje, Lewis-Fernandez, Hall, & Reed, 2019). These studies confirmed that, when compared with the ICD-10, the current ICD-11 including PGD improved the diag nostic sensitivity of grief related psychopathology, especially once the duration since loss criteria was included. However, until now this evaluative phase is limited and there are large scientific gaps in establishing the validity of the new ICD-11 PGD, particularly in a global context (Boelen, Spuij, & Lenferink, 2019;Eisma & Lenferink, 2018).
Previous research has confirmed that PGD may have different prevalence rates in different samples. For example, worldwide rates of a disorder of grief may range from 1% to 10% (Kristensen, Weisaeth, & Heir, 2012;Lundorff, Holmgren, Zachariae, Farver Vestergaard, & O'Connor, 2017). In a recent scoping review we found that the rates of disordered grief appear to be much higher in Asian countries compared to countries in Europe and North America (Stelzer, Zhou, Maercker, O'Connor, & Killikelly, 2020). This may depend on different factors including heterogeneity in the diagnostic criteria used, the sample characteristics, and, perhaps, specific cultural factors that may influence the assessment and reporting of grief symptoms. In this study, we sought to eliminate the methodological variability of previous studies by directly comparing some of the same diagnostic criteria items across multiple national samples, as well as exploring the sample characteristics and their influence on PGD symptoms. This paper explores core items of the new ICD-11 PGD disorder criteria along with some of the supplementary items indicating emotional distress, across five international samples. The aims include: firstly, the examination of rates of possible PGD caseness using the same core items and diagnostic formulations in each country. Secondly, exami nation of criterion validity through the identification of predictors of PGD across and between countries. Thirdly, to find provisional cut-off scores and assess the thresholds for the best sensitivity and specificity in each country using the receiver operating characteristic analysis (ROC).

Method Participants
Data from participants who experienced the loss of a loved one were analyzed. Data sets were obtained from five different countries: Switzerland (N = 214), China (N = 325), Israel (N = 544), Portugal (N = 218), and Ireland (N = 830). For demographic information see Table 1.1. For additional demographic characteristics for each sample please see Tables  1-4 in the Supplementary Materials. online and in person fliers posted at German speaking grief and bereavement support groups, online forums and community services (i.e. churches, townhalls, libraries). China: Participants were recruited to participate in an online survey (Qualtrics) using social media (WeChat) and online bereavement forums. Israel: Participants were recruited as part of a large national online survey using stratified and random sampling methods. Ireland: A nationally representative sample were recruited using the company Qualtrics. Stratified sampling methods were used to select participants based on sex, age and geographical location. Portugal: The 'general' group were recruited using Limesurvey anonymous online survey protocol using the snowball method. The 'clinical group' is based on participants from a Hospital setting (Centro Hospitalar Tâmega e Sousa) where participants received outpatient support for grief difficulties. Participants in this group were referred to the Grief Consultation Service part of the Clinical Psychology Unit and had completed informed consent procedures. This service is focused on supporting parental and perinatal losses and data was collected in face-to-face interviews with self-evaluation questionnaires.

Measures
To assess prolonged grief disorder, the International Prolonged Grief Disorder Scale with 15 items  and the Inventory of Complicated Grief-Revised with 8 items (ICG-R; Prigerson et al., 2009;Prigerson & Jacobs, 2001) were used. Both instruments include two core PGD symptoms (i.e. yearning for the deceased and preoc cupation), emotional distress symptoms as well as a measure of functional impairment, and time since loss. For the items of the IPGDS please see Killikelly et al. (2020). The following 8-items of the ICG-R were assessed: core items 1) 'I think about him/her so much that it can be hard for me to do the things I normally do' 2) 'I feel myself longing and yearning for him/her'; accessory symptoms or examples of emotional distress, 3) 'I feel as if a part of me died' 4) 'I feel disbelief over his/her death' 5) 'Ever since he/she died, I find it difficult to move on with my life' 6) 'I am bitter over his/her death' 7) 'I feel that it is unfair that I should live when he/she died' and functional impairment criterion, 8) 'I believe that my grief has resulted in impairment in my social, occupational or other areas of functioning. Unlike the ICG-R, the IPGDS includes one cultural item (i.e. My grief would be considered worse, e.g., more intense, severe and/or of longer duration, than for others from my community or culture). Participants were asked to rate their grief symptoms on a five-point scale (i.e. "not at all" on IPGDS or "almost never" on ICG-R (1), "rarely" (2), "sometimes" (3), "often" (4), "always" (5)). When filling out the IPGDS, participants were asked to mark the answer that best describes their feelings, thoughts and behaviour during the last week. In case of ICG-R, they were requested to select an answer that best describes how they felt during the last month. PGD was assessed using the IPGDS in Switzerland, China, and Portugal, and the ICG-R in all five countries. Recently the IPGDS was confirmed to be psychometrically reliable and ICD-11 Prolonged Grief Disorder Guidelines Across Five International Samples valid with strong internal consistency (Cronbach's α = .92), high concurrent and criterion validity (see Killikelly et al., 2020). Previously the 8-item ICG-R was shown to have good reliability (Cronbach's α = .94) (Killikelly et al., 2019).

Predictors
Life Events Checklist (LEC) (Gray, Litz, Hsu, & Lombardo, 2004) and International Trauma Exposure Measure (ITEM) (Hyland et al., 2020) items were measured on a binary scale (0 = no; 1 = yes). For the LEC response options 1-2 (happened to me, witnessed it) were merged into 'yes' while all other response options were merged into 'no'. Information about traumatic events was not collected for the Portuguese sample. Furthermore, in the Portuguese sample, the duration since loss was not assessed and the data set revealed a high quantity of missing values (100 out of 218 participants) on the ICG-R scale. Therefore, the Portuguese sample was excluded from the data analysis when the association between predictors and PGD was investigated. The cultural item was collected only in Switzerland, China, and Portugal. The following variables were included in the data analysis as predictors of PGD: 1. Gender (measured in all 5 samples) 2. Age (measured in all 5 samples) 3. Cultural criteria (measured in Swiss, Chinese, Portuguese samples) 4. Severe human suffering (measured in Swiss, Chinese, Israeli samples with LEC, and in Irish sample with ITEM) 5. Sudden, violent or accidental death (measured in Swiss, Chinese, Israeli samples with LEC and in Irish sample with ITEM) 6. Serious injury, harm or death you caused to someone (measured in Swiss, Chinese, Israeli samples with LEC and in Irish sample with ITEM)

Statistical Analysis
To estimate possible PGD rates, three different diagnostic algorithms were applied; PGD strict criteria set, PGD moderate criteria set, and the criteria set according to Maciejewski et al. (2016). PGD strict criteria set requires the endorsement of at least one core item, at least one item of emotional distress symptoms, and functional impairment; all of which are rated as 4 (often) or higher. PGD moderate criteria set has almost the same require ments except all items are rated 3 (sometimes) or higher . Criteria according to Maciejewski et al. includes at least one of two core items, three or more emotional distress items (all rated 4 (often) or above), and no functional impairment. In all three diagnostic algorithms the same time criterion was applied (i.e., loss occurred 6 months ago or longer). The estimated rates of possible PGD were calculated across the five samples with 95% Confidence Interval (CI). However, it is important to note that some key items were missing in the datasets. In the Portuguese and the Israeli samples Killikelly, Merzhvynska, Zhou et al.
the time criteria was not applied due to the absence of the data about time since loss and in the Portuguese dataset the functional impairment criterion was not evaluated. Therefore we can only examine estimates of possible PGD caseness not prevalence. Logistic regression was used to examine the associations between PGD (strict criteria) and some items representing traumatic life events, gender (male/female), age, and cultur al caveat item using odds ratio (OR) and 95% CI. The outcome was the endorsement of PGD strict criteria; coded as binary variable "yes, possible PGD caseness" (1) or "no" (2). Of note, due to the use of heterogeneous questionnaires across the samples, we could only include a few traumatic life event items. In terms of missing values, the default settings of SPSS were used whereby cases were deleted in a list wise manner. Third, Re ceiver operating characteristic analysis (ROC) was used to examine cut-off scores for the IPGDS and ICG-R, i.e. the threshold for the best fit in terms of sensitivity (high > .80) and specificity (.80). This analysis is presented as an initial exploration and may be highly dependent upon the samples used. ROC curves and logistic regression were calculated only for PGD strict criteria (i.e. 12 symptom items plus functional impairment). Statistical analyses were performed using SPSS version 23.

Rates of PGD
The proportion of people in each sample who met the criteria for possible PGD caseness differed within the country depending on (1) whether strict, moderate or Maciejewski et al. (2016) diagnostic criteria were applied and (2) whether IPGDS or ICG-R were used to assess it. Furthermore, there was a difference in rates between the countries, even if assessed with the same diagnostic algorithm and the same measure instrument. For example using the strict criteria of the IPGDS the rates ranged from 6.9% to 12.6%, whereas for the ICG-R rates ranged from 2.0% to 21.1%. For detailed rates and confidence intervals (CI) see  Table 2.1).  Table 2.2).

Examination of Provisional Cut-Off Scores
The ROC analysis was used to determine a cut-off score for those participants meeting the strict criteria for the IPGDS and ICG-R. The results can be found in Table 3. The Chinese sample required a slightly higher cut-off score (42.5) for the IPGDS when compared to the Swiss (37.5) and Portuguese (36.5) samples. Additionally, for the ICG-R the Portuguese sample had a lower cut-off (16.5) when compared with the Swiss (24.5), Chinese (25.5), Israeli (24.5) and Irish (22.5) samples.

Discussion
This paper provides the first systematic exploration of core items of the new ICD-11 PGD criteria across five international samples. The results confirm large differences in the rates between and within samples depending on the diagnostic algorithm used; predictors of PGD severity may vary across samples due to the type of loss (violent or nonviolent) and the cultural caveat item of the IPGDS may be an important risk screening item; finally, a threshold for a clinically relevant diagnosis may be different depending on cultural group. Core items of the new ICD-11 PGD criteria, as tested by the IPGDS (in Swiss, Chinese and Portuguese samples) and the ICG-R (in Irish and Israeli samples), revealed substantially different rates depending on the diagnostic algorithm used. Overall, the strict criteria for both the IPGDS and the ICG-R seems to capture the expected rates across the five samples, which ranged from 2-21.2%. However, substantially higher rates were found in the Chinese and Portuguese samples. There could be several explanations for these higher rates including sample differences and lack of cultural sensitivity of assessment measures (Stelzer, Zhou, & Maercker, et al., 2020). When the strict criteria of the IPGDS were applied, the Swiss (7.0%) and Portuguese (6.9%) samples had similar rates on the IPGDS, whereas the Chinese sample had a higher rate (12.6%) on the IPGDS. A higher rate in the Chinese sample is consistently found across all iterations of the IPGDS but also for most of the ICG-R comparisons. Conversely, when assessing the ICG-R the Swiss, Chinese, Israeli and Irish samples had similar rates, whereas the Portuguese sample was much higher (21.1%). The Portuguese sample also had high rates on the ICG-R for the strict and moderate criteria, perhaps due to the exclusion of the impairment criteria in this particular sample. Therefore, the results for the Portuguese sample must be interpreted with caution and it points to the importance of including the functional impairment item and ensuring consistency in the use of time criterion in the assessment measure. Additionally, the Portuguese sample included pooled data from the general and clinical sample. The inclusion of the clinical sample could increase the prevalence rates in the Portuguese data compared to the non-clinical samples obtained from the other countries.
The Portuguese sample consisted of a large proportion of bereaved people who expe rienced an unexpected loss (10%). Although not explicitly recorded, this would mostly include the unexpected loss of a child as participants were from the outpatient perinatal loss clinic. Loss of a child is known to predict high levels of PGD (Zetumer et al., 2015) Lack of culturally sensitive assessment measures or items could explain differences in the symptom ratings and severity levels across the samples. For example, our previ ous study confirmed that Chinese bereaved may present with slightly different symp toms than those assessed by the ICD-11 (Killikelly & Maercker, 2017;Stelzer, Zhou, Merzhvynska, et al., 2020). The IPGDS standard scale does not explore somatic symptoms or culturally specific symptoms such as 'a loss of a part of oneself' (Stelzer, Zhou, ICD-11 Prolonged Grief Disorder Guidelines Across Five International Samples Merzhvynska, et al., 2020). Additionally, there could be a cultural bias in responding to these questionnaires which may lead to overreporting and overestimation of symptoms. Chentsova-Dutton et al. (2007) found that Chinese participants may overreport certain symptoms in order to ensure that they receive health care and support.
In terms of predictors of PGD severity we assessed a limited selection of predictors available across the datasets. Interestingly, when the cultural caveat item was included (e.g. endorsement of Item 14 of the IPGDs), violating the cultural norms for grief was found to significantly predict more severe grief scores on the IPGDS and the ICG-R. Al though we only had the data for the Swiss and Chinese participants, further examination of this item might indicate its importance as a screening item for grief severity. In both the Israeli and Irish sample grief severity was predicted by sudden violent or accidental death whereas this was not found for the Swiss and Chinese samples. This may be due to differences in sampling. The Israeli and Irish data are from large nationally representa tive samples that may include more instances of sudden violent or accidental death. The Chinese and Swiss samples are mostly student populations who experienced the loss of older relatives. The larger Israeli and Irish datasets contain participants who experienced a high level of violent loss (more than 25%) and this could explain the differences in predictors. Previous research has confirmed that violent loss is a strong predictor of PGD severity and chronicity (Lobb et al., 2010;Schaal, Jacob, Dusingizemungu, & Elbert, 2010). Additionally, Israel and Ireland have recently experienced acts of terrorism that may preclude an added cultural vulnerability to trauma and loss (Duffy, Gillespie, & Clark, 2007;Silverman, Johnson, & Prigerson, 2001).
The final research question was to determine a possible threshold for establishing a clinically significant severity score on the IPGDS. All five datasets could not be com pared with the IPGDS however across the Swiss, Chinese and Portuguese data, a score above 36.5 will most likely represent clinically significant PGD symptoms. As a control, the ICG-R was also examined and a score above 22 for all datasets was consistently found, except for the Portuguese sample (16.5). This attests to the variation that can occur across different samples, even with gold standard clinical assessments (Boelen & Lenferink, 2020).

Limitations
Due to inconsistencies in data collection across the five international samples it was not possible to directly compare the IPGDS or the ICG-R across all data sets. The full ICD-11 PGD criteria could therefore not be assessed. In particular the time criterion was not assessed consistently across the datasets for example not in the Portuguese or Israeli datasets. Therefore, a diagnosis of PGD is not possible. However, the core items of the PGD (yearning and preoccupation) as well as some supplementary items of emotional distress could be evaluated and indications of possible caseness implied. It is important to include the time criterion for disorder as individuals may experience severe distress in the first weeks and months after a loss and this should not be pathologized. Importantly the estimates of prevalence rates for the Portuguese data must be interpreted with caution as there was a high amount of missing data. Furthermore, the Portuguese sample included a clinical subgroup. This may explain why the estimates of prevalence are significantly higher. Across the German, Portuguese and Chinese samples there is a high proportion of female responses. In the future it would be important to provide an analysis of a more representative sample. Additionally, there were only a limited number of similar predictors across all datasets. The data in each country was collected separately at different times, so only a cross sectional comparison is possible on some questionnaire items. Of note, the confidence intervals are very wide for some of the items in the logistic regression, particularly for the cultural criteria. This is perhaps due to a small number of values in some of the cells (response options). In the future a larger sample size should reveal more precise confidence intervals. Finally, in the future and with a more complete dataset the ROC analysis should also be conducted on the moderate and Maciejewski et al. (2016) criteria to provide a full estimate of possible thresholds for sensitivity and specificity.

Conclusion
This paper confirms the importance of establishing international guidance on the consis tent use of a diagnostic algorithm for PGD in order to ensure reliability across heteroge neous samples. Currently, we recommend the use of the strict criteria as an indicator of PGD caseness, however this must be confirmed in a clinical sample. Future studies should examine the different PGD algorithms (moderate vs strict) in clinical and cultural samples and include important items that are missing in some of the current data (i.e. the impairment and time criteria as well as the cultural caveat). Additionally, clinicians should be aware of specific risk factors such as violent, sudden loss or screening 'yes' on the cultural caveat IPGDS item as these may predict clinically severe grief. In the future it may be important for clinicians to note that different cultural groups may need different cut-off thresholds for a clinical diagnosis on the IPGDS or other scales.

Funding:
The authors have no funding to report.

Competing Interests:
The authors have declared that no competing interests exist.