To Determine the Risk-Based Screening Interval for Diabetic Retinopathy: Development and Validation of Risk Algorithm from a Retrospective Cohort Study
Article information
Abstract
Background
The optimal screening interval for diabetic retinopathy (DR) remains controversial. This study aimed to develop a risk algorithm to predict the individual risk of referable sight-threatening diabetic retinopathy (STDR) in a mainly Chinese population and to provide evidence for risk-based screening intervals.
Methods
The retrospective cohort data from 117,418 subjects who received systematic DR screening in Hong Kong between 2010 and 2016 were included to develop and validate the risk algorithm using a parametric survival model. The risk algorithm can be used to predict the individual risk of STDR within a specific time interval, or the time to reach a specific risk margin and thus to allocate a screening interval. The calibration performance was assessed by comparing the cumulative STDR events versus predicted risk over 2 years, and discrimination by using receiver operative characteristics (ROC) curve.
Results
Duration of diabetes, glycosylated hemoglobin, systolic blood pressure, presence of chronic kidney disease, diabetes medication, and age were included in the risk algorithm. The validation of prediction performance showed that there was no significant difference between predicted and observed STDR risks in males (5.6% vs. 5.1%, P=0.724) or females (4.8% vs. 4.6%, P=0.099). The area under the receiver operating characteristic curve was 0.80 (95% confidence interval [CI], 0.78 to 0.81) for males and 0.81 (95% CI, 0.79 to 0.83) for females.
Conclusion
The risk algorithm has good prediction performance for referable STDR. Using a risk-based screening interval allows us to allocate screening visits disproportionally more to those at higher risk, while reducing the frequency of screening of lower risk people.
Highlights
• We developed the first risk algorithm for STDR in Asians.
• It successfully distinguishes high-risk individuals from lower-risk ones.
• It allows for disproportionately allocating screening visits to those at higher risk.
• The acceptable risk margin must consider equitable resource allocation.
INTRODUCTION
Diabetic retinopathy (DR) is a common microvascular complication of diabetes mellitus (DM). DR is normally asymptomatic until significant vision is lost, so prevention of blindness relies on early detection and timely treatment. There is no debate whether regular DR screening is worthwhile and cost-effective [1-3], and many current guidelines recommend annual screening of those with diabetes [4-6]. Progression rates to sight-threatening diabetic retinopathy (STDR) among some groups might be sufficiently low to permit lengthened screening intervals [7,8] allowing shortened intervals for those at highest risk. The first algorithm aiming to estimate risk-appropriate screening intervals was developed in Iceland and allocated intervals from 6 to 60 months depending on risk [9]. A later algorithm, developed in Liverpool, used primary care data to indicate risk [10]. Both algorithms were internally validated but used slightly different risk predictors with glycosylated hemoglobin (HbA1c), DM duration, systolic blood pressure (SBP), and presence of DR at baseline in both but sex and type of DM for Iceland and age at diagnosis of DM and total cholesterol for Liverpool. Selection of potential risk predictors could be limited by data availability but may also vary across populations, as could history of diabetes management and DR screening, so an algorithm derived from another population might require recalibration. Existing algorithms have facilitated moving from a one-fits-all screening approach to a tailored, risk-based approach, but we do not know the implications of adopting them in a new environment such as an Asian population.
Hong Kong (HK) started systematic screening for DR in 2010 as part of a multi-disciplinary The Risk Assessment and Management Programme–Diabetes Mellitus (RAMP-DM) [11,12]. People participating in RAMP-DM receive a standardized procedure that includes regular assessment of clinical parameters for risk level stratification and tailored diabetes management options based on their risk. The quality-assured DR screening component uses digital fundus photography and follows the English National Screening Programme (guidelines) [11-14]. Screening intervals have mostly been based on the Iceland algorithm giving 6-, 12-, or 24-month follow-up intervals according to risk [9,12]. Between 2010 and early 2014, around 170,000 people were screened, a majority of those managed in public primary care [15]. The population DR risk in HK might be higher than other developed countries because of the recency of comprehensive diabetes management. For example, the prevalence of any DR and STDR was 39.0% and 9.8% respectively, higher than at the establishment of systematic screening in Liverpool (25.3% and 6.0%) [11,16]. This raises the question of applicability of existing algorithms for this population. Given increasing demand for screening due to increasing numbers with diabetes, improving screening efficiency is urgent. The aims of this study were to develop and validate a risk algorithm for referable retinopathy/maculopathy based on a mainly Chinese population and apply it to determine risk-based screening intervals.
METHODS
Data extraction and processing
The retrospective cohort included all those enrolled in RAMP-DM on or before December 31, 2016. Anonymized data were extracted from the clinical management system and included socio-demographic data: year of birth, sex, smoking/drinking habit, residential district and whether receiving comprehensive social security assistance; clinical data: body mass index (BMI), SBP and diastolic blood pressure, HbA1c%, total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, estimated glomerular filtration rate (eGFR), and urine albumin-to-creatinine ratio (UACR); clinical characteristics: duration and type of DM, use of oral DM drugs, insulin, anti-hypertensive and lipid-lowering drugs; complications of diabetes: hospital admission and out-patient clinic attendance due to diabetes complications; retinal summary: date of screen, level of DR/maculopathy and history of treatment, i.e., photocoagulation/vitrectomy from when the subject joined RAMP-DM to the end of 2016.
The first screening appointment after the systematic program commencement was defined as the baseline date for each subject, and a screening record within two months (i.e., –62≤ date ≤62 days) of this baseline date as the baseline screening record. If more than one record was available, that closest to the baseline date was used. For clinical data, the value closest to and within 180 days before or 30 days after the baseline date was used. Duration of DM was estimated as the year of baseline screening minus the year of diagnosis. Type of DM was extracted from clinical records but not used in the development of the algorithm since almost all (99.9%) subjects had type 2 diabetes mellitus (T2DM), though the algorithm was aimed at both types. Presence of chronic kidney disease (CKD) was defined as eGFR <60 mL/min/1.73 m2 or UACR ≥3 mg/mmol [17]. Use of DM drugs was categorized into “none,” “oral drugs only,” and “insulin,” while antihypertensives or lipid-lowering drugs were categorized as “yes” or “no,” based on dispensing records just prior to the baseline screen. Any history of DM complications was based on International Classification of Primary Care, Second Edition (ICPC-2) or International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes for relevant complications being present in the clinical record (Supplementary Table 1) [18]. The DR/maculopathy grades from both eyes were consolidated into a subject-based status based on the worst eye. The classification of DR followed the guideline of no DR (R0), background DR (R1), pre-proliferative DR (R2), proliferative DR (R3), and maculopathy (M1) [11,13], with R2, R3, or M1 designated as possible STDR and referred to specialist care [11,12].
Ethics approval was obtained from the Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster (HKU/HA HKW IRB; Ref: UW 16-089), Hong Kong East Cluster Research Ethics Committee (HKEC REC; Ref: HKEC-2016-076), and Research Ethics Committee (Kowloon Central/Kowloon East) (REC [KC/KE]; Ref: KC/KE-16-0178/ER-2). Written informed consent by the patients was waived due to a retrospective nature of our study.
Subjects
Subjects fulfilling the following criteria were eligible for the cohort: received at least one complete DR screen after the program start; R0 or R1, with no referable STDR at this screen; and had at least one further screen. Eligible subjects were randomly split 2:1 into derivation and validation datasets. A random number between 0 and 1 was generated for each subject and sorted. The top two-thirds were used to derive the algorithm and prediction performance was internally validated on the remaining third.
Statistical analysis
A descriptive analysis of baseline characteristics was done. The derivation and validation cohorts were compared using chi-square tests for categorical variables and t-tests for continuous variables. Parametric survival analysis was used to develop the risk algorithm. Assumptions related to proportional hazards and the distributions in the parametric survival analysis were checked by Schoenfeld residuals and log-log plots. A Weibull distribution gave the best fit to the algorithm when stratified by sex and presence of R1 at baseline, giving four groups: R0/R1 and male/female.
The time scale (t) was the time from baseline screening, and the first occurrence of referable STDR (R2, R3, and M1) after baseline was the outcome event. Subjects who did not develop STDR were right-censored at the time of their last follow-up screen. For subjects who did develop STDR, the event time was interval-censored between the first STDR record and the previous non-STDR screening.
Potential predictors were identified from the literature and all were at least 80% complete in the RAMP-DM dataset; they included (1) those considered in the two published risk-prediction models (Iceland and Liverpool) [9,10] and the Diabetes Control and Complications Trial (type 1 only) [19]; (2) those shown to be associated with DR/STDR in published reviews or meta-analyses; and (3) those recommended as risk factors in clinical guidelines.
Potential “core predictors,” identified as those which showed relatively consistent evidence from the literature, included duration of DM, HbA1c, blood pressure, lipid level, presence of CKD, DR status at baseline, and sex (Supplementary Table 2). Other predictors with less consistent evidence included age at diagnosis of DM (during development, we substituted with age at screening to avoid overlap of information with duration of DM), smoking status, BMI, medication use and history of cardiovascular diseases. The fitting process began with including all “core predictors” into a multivariate survival model, followed by a Wald test to reduce the number of covariates and Akaike information criterion (AIC) to confirm the selection. Each of the other predictors was tested in univariate survival models for any association with STDR. Significant predictors were added to the model from the previous step and the Wald test and AIC calculation were used to determine whether the model improved. Core predictors were retained unless they were not significant in any group or better goodness-of-fit was achieved after removal. A working group was set up to comment and advise on the predictor selection, and included medical specialists in family medicine and primary care, ophthalmologist, experts in health economics, health services research and statistician. Finally, covariates excluded in previous steps were added back to ensure they did not improve model prediction. The algorithm was based only on subjects with complete data in the final set of clinical predictors. Analyses were conducted using STATA software version 15 (StataCorp., College Station, TX, USA).
Validation of prediction model
The coefficients from the parametric survival model with Weibull distribution were transformed into a mathematical algorithm:
where S(t) is the estimate of survival at time t; p is the shape parameter of the baseline survival function and βi are the coefficients for each covariate Xi. Probability of developing STDR in the time interval ∆t can be calculated as:
To assess algorithm performance, we applied the above formulae to the validation dataset and estimated the 2-year risk of STDR from baseline for each subject. We compared the cumulative referable STDR in 2 years with the 2-year predicted risk and examined calibration using the Hosmer-Lemeshow chi-square test and discrimination using the area under the receiver operating characteristic (ROC) curve (AUC).
Applying the algorithm to determine risk-based screening interval
The algorithm was used to estimate the time for an individual (Δt) without STDR at baseline to reach a pre-set risk margin.
As in the Iceland algorithm, we determined the risk margin according to the annual incidence, i.e., the average risk of screen positive STDR, to ensure that subjects assigned to the tailored screening intervals should not be exposed to a risk level higher than they would experience with annual screening [9]. Two sets of risk margins were considered: (1) 2.5% for both R0 and R1 at baseline, reflecting the average annual incidence of all screen positive cases in HK and (2) 2.5% risk margin for subjects with R0 at baseline and 5.0% for R1 because the average annual incidence of screen positive STDR with R1 at baseline is 5.4%. The assigned interval was converted to 6 months (if predicted time to STDR is 9 months or less), 12 months (between 10 and 21 months) or 24 months (22 months and above) as was done in HK when the Iceland risk algorithm was applied.
Comparing to the current practice using Iceland algorithm
We compared the actual time when referable STDR was detected, based on current practice in HK using the Iceland risk algorithm, to the assigned screening interval from the new HK algorithm. The screening interval was considered appropriate if the assigned screen was before, or no more than 6 months later than the actual date of detection of a new case at screening. For those assigned to a delayed screening interval (i.e., observed time of STDR detection was ≥12 months earlier than the assigned interval), the severity of that case was explored and later screening records checked for the possibility of it being a false positive, e.g., the individual had returned to the screening program and subsequent records showed no STDR.
Comparing to annual based screening interval
The total number of screens required for the risk-based screening over 2 years was calculated by assuming four screening visits in 2 years for those with a 6-month screening interval, two visits for a 12-month interval, and one visit for a 24-month interval. This estimated number was compared with the number of visits in an annual screening strategy.
RESULTS
Descriptive analysis
From the CMS, 284,837 records were extracted among whom 117,418 subjects fulfilled all inclusion criteria (Fig. 1) and were split into derivation (n=78,279) and validation (n=39,139) datasets. The only statistically significant group differences in baseline characteristics was DR status at baseline and SBP, but neither was clinically significant (Table 1). In the derivation cohort, over a mean follow-up of 3.0 years, 4.2% (3,323/78,279) subjects died and 5.8% (4,568/78,279) developed STDR—an STDR incidence rate of 19.4 per 1,000 person years (4,568/235,728 person years).
HK risk algorithm for referable STDR
Six predictors remained in the final best-fit model, including duration of diabetes, HbA1c, SBP, and presence of CKD from the “core” list of predictors, plus DM medication and age at screening. None of the dropped variables improved prediction performance when added back. The hazard ratios of the final model for each group stratified by sex and presence of R1 at baseline are shown in Table 2. The formulae to predict the risk of referable STDR in a specific time period, based on the values derived from Table 2 for each of the four groups, are given in Supplementary Table 3 with an example demonstrating the calculated risk based on values of these six predictors.
Validation results of HK risk algorithm
Prediction performance was assessed on the 23,680 (out of 39,139) subjects with complete data on all six predictors in the validation dataset (Table 3). There were 1,146 new referable STDR cases detected from follow-up screening (582 males, 564 females). The 2-year predicted and observed risks of referrable STDR among males were 5.6% and 5.1% respectively (P=0.724) for the R0 and R1 groups combined, while among females they were 4.8% and 4.6% (P=0.099). Mean predicted 2-year risk matched well with the observed 2-year risk (Fig. 2). The area under the receiver operative characteristic curve (AUC) was 0.80 (95% confidence interval [CI], 0.78 to 0.81) in males and 0.81 (95% CI, 0.79 to 0.83) in females implying good discrimination.
Assigned screening interval using 2.5% risk margin for both R0 and R1
The formulae to calculate the time to reach the risk margin are in Supplementary Table 4, with an example demonstrating the calculated time interval based on an individual’s risk factor profile.
Among the 23,680 subjects used to validate the algorithm, 37% subjects were R1 and 63% R0. Using 2.5% risk margin for R0 and R1 assigns 36.6% of subjects to 6-month, 8.5% to 12-month, and 54.8% to 24-month screening intervals, an increase of 9.2% in screens over 2 years compared to annual screening. Most subjects (86.6%) who were R0 at baseline would be assigned to a 24-month screening interval resulting in 40.9% reduction in screens compared to annual screening in the R0 group, while 95.0% of the R1 subjects would be assigned a 6-month screening interval resulting in an increase in screens compared to annual screening in the R1 group (Table 4).
Among the 1,146 subjects with referable STDR detected at follow-up screening 97.0% (1,112/1,146) would have been assigned an appropriate screen interval before, or no more than 6 months later than, the date on which STDR was detected. Among these cases, the new HK risk algorithm would assign 755 (out of 1,112) an interval at least 6 months earlier than the detection time using the Iceland algorithm. A proportion of 3.0% (34/1,146) would have been assigned a screening date around 12 months beyond the date that STDR was detected. All of these potentially delayed cases had R0 at baseline but none were referred as R3 cases requiring urgent referral and 70% were graded as M1. Further checks of subsequent screening records for these 34 cases showed that 20 had later returned to screening with 13 designated as having no STDR, while seven remained referable STDR. Thus, at least 13 cases were likely to be false positives and the proportion of new referable STDR cases whose detection might be delayed reduces from 3.0% to 1.8% (21/1,146).
Assigned screening interval using 2.5% risk margin for R0 and 5.0% for R1
If the risk margin for those with R1 at baseline is increased to 5%, around 26.7% of all subjects in the validation group would be assigned to a 6-month, 14.5% to a 12-month and 58.8% to a 24-month screening interval, resulting in 2.7% fewer screens compared to annual screening. Among the 1,146 new STDR cases, 95.9% (1,099/1,146) would receive an appropriate screening interval compared to actual detection time but 4.1% (47/1,146) could have been delayed by 12 months. However, again, none of these were R3 cases and 19 were likely to be false positives giving a proportion of 2.4% (28/1,146) with potential delay in detection.
DISCUSSION
Our risk algorithm included four subgroups, i.e., R0 males, R1 males, R0 females, and R1 females, with the same six predictors in each group but with different risk coefficients. The predictors were HbA1c, SBP, duration of diabetes, presence of CKD, DM medication, and age. These data are routinely collected in the RAMP-DM monitoring visits with over 80% completion. Validation of the algorithm showed good discrimination with an AUC of around 0.80 among both males and females, above the generally accepted value of 0.7 [20]. This indicates nearly 80% probability that a randomly selected patient who develops STDR will be given a higher predicted risk than a randomly selected patient without STDR. The algorithm has good calibration with no statistical difference between the predicted and observed risk.
Four risk predictors [9,10] used in algorithms from other populations are consistent with ours reflecting some homogeneity in the risks contributing to STDR; these are HbA1c, SBP, duration of diabetes, and DR status at baseline which we used for stratification. Our algorithm selected another three predictors which strongly predicted the risk of referable STDR: presence of CKD at baseline, DM medication, and age at screening.
Using our algorithm, a risk-based screening interval can be assigned by setting the level of acceptable risk thus allowing screening visits to be allocated disproportionally more to those at higher risk. We found that a large proportion of the risk of referable STDR was predicted by having R1 at baseline. When the algorithm was used with a 2.5% risk margin, a large proportion of R0 cases would be allocated a 24-month interval, resulting in 40% reduction in their screens but all the R1 cases would be allocated a 6-month screening interval resulting in a 9.2% increase in total screens. This is in contrast with the findings from Liverpool where individualized screening intervals reduced the number of screens compared to annual screening [21,22]. However, HK has around 37% cases with R1, higher than other places with a longer history of DR screening, e.g., Liverpool has around 18.7% R1 [22]. The total number of screens is likely to reduce over time as risk factors reduce due to better management. For example, if the proportion of R1 among non-STDR cases was reduced by one-third from 37% to 25%, the total screens for the subjects in the validation cohort would be reduced by 7.5%, compared to annual screening.
Applying a higher risk margin of 5% rather than 2.5% to the R1 cases would decrease the number of screens by about 11% and might be justified because the alternative of a fixed annual interval could expose many of these cases to an even higher risk. However, there would potentially be more STDR cases assigned a screening date around 12 months after they actually develop STDR although the 2.4% cases in this category in our study did not include any proliferative DR cases. While the higher threshold would increase the risk of delaying the detection of referable STDR, a 0.5% increase in the delayed risk versus more than 10% decrease in screening appointments may be considered worthwhile by decision makers. It is inevitable that there will always be a trade-off between the safety of the interval and the number of screens. Local decisions on acceptable risks, taking into account the equity of allocation of resources, are essential.
The advantages of this study are, firstly, we have a dataset with good completeness from a government, centralized system which links over 100,000 subjects’ routine clinical and DR screening data. This ensures that the derived HK algorithm is applicable to clinical practice. This is the first DR algorithm for an Asian population, in this case a mainly Chinese population and could have applications in mainland China.
There are a few limitations. Firstly, the endpoint of being classed as having referable STDR was based only on the screening result, rather than confirmation after referral, due to lack of follow-up data. This may include false positive STDR cases counted as incident referable STDR. However, this represents how screening operates with detected cases referred onwards for further assessment. Further research might examine the screening process itself to reduce the risk of false positives. Secondly, since RAMP-DM covers the great majority of people with T2DM in HK, there is currently no suitable external dataset to further validate the performance of the algorithm; however, it is possible that a population, e.g., from mainland China, could be used for validation in the future.
The potential impact of this risk algorithm is beyond the determination of the risk-based screening interval. It helps primary care clinicians identify individuals at higher risk of DR and hence target them for better management of risk factors, such as blood pressure and blood glucose level, which might achieve further reductions in the risk of DR-related blindness.
HK is one of the few places in the world adopting risk-based screening, while Iceland and Sweden use extended screening intervals for people at low risk [23]. The implementation of risk-based screening requires the support of the primary care system to collect data on the risk factors and to ensure compliance to the extended screening interval, so that people at risk will not fall out of the system.
This algorithm was developed with data from the first 6 years after establishment of systematic DR screening, where the population started at a relatively high risk of STDR on the first pass due to high numbers being screened for the first time. After several years of the program, many cases will have been detected and, as time goes on, we would expect the average risk of STDR to settle at a lower level. This algorithm is potentially applicable to other Asian countries who are considering to set up systematic DR screening in their established diabetes management system [24,25]. The risk algorithm is likely to have good discrimination to distinguish the high risk from the low risk group. However, recalibration would be advised to tailor the predicted risk to the local population risk of DR due to potential variation in risk across populations and possible differences in weights assigned even to common predictors.
Future research might focus on how to further improve prediction of referable STDR by discriminating the very high from the high-risk group. This could allow a shift of some R1 cases from 6- to 12-month screening intervals, with minimal risk of missing STDR development and would reduce screening visits. Other potential predictors, beyond the clinical risk factors included here might also be considered, e.g., retinal biomarkers shown on the fundus photographs. The population risk may well change as the RAMP-DM program matures and diabetes is better and earlier managed. The actual risk in the population can be regularly monitored and compared against the risk predicted by the algorithm in the future, particularly after the first pass effect of the DR screening. Future research should also evaluate the cost-effectiveness of the risk-based approach versus annual screening to understand the long-term impact on health outcomes in terms of prevention of blindness and sight years saved across the life-time, and long-term resource allocation.
In conclusion, a risk algorithm for referable STDR was developed and validated based on a mainly Chinese population with T2DM in HK and showed good discrimination and calibration. Applying this algorithm to determine risk-based screening intervals allows us to allocate screening visits disproportionally more to those at higher risk, while reducing the need for more frequent screening of lower risk people.
SUPPLEMENTARY MATERIALS
Supplementary materials related to this article can be found online at https://doi.org/10.4093/dmj.2024.0142.
Notes
CONFLICTS OF INTEREST
No potential conflict of interest relevant to this article was reported.
AUTHOR CONTRIBUTIONS
Conception or design: J.L., S.M.M., T.T., C.L.K.L., J.C.H.C.
Acquisition, analysis, or interpretation of data: all authors.
Drafting the work or revising: all authors.
Final approval of the manuscript: all authors.
FUNDING
This work was supported by the Health and Medical Research Fund (HMRF) of the Hong Kong SAR Government (grant number 14151971). The funding organization had no role in the design or conduct of this research.
Acknowledgements
The authors thank the Food and Health Bureau and the Health and Medical Research Fund (HMRF) of the Hong Kong SAR government (grant number 14151971) for supporting this study. The authors would like to thank Dr. David Chao and Dr. Michelle Wong for their help in the application of IRB ethics approval, and Dr. Chris Chau for his support in the grant application. Also, the authors would like to thank the Hospital Authority of Hong Kong for granting the access to data necessary for this study.