Big Data Research for Diabetes-Related Diseases Using the Korean National Health Information Database

Article information

Diabetes Metab J. 2025;49(1):13-21
Publication date (electronic) : 2025 January 1
doi : https://doi.org/10.4093/dmj.2024.0780
1Department of Internal Medicine, CHA Bundang Medical Center, CHA University School of Medicine, Seongnam, Korea
2Department of Internal Medicine, College of Medicine, The Catholic University of Korea, Seoul, Korea
3Department of Statistics and Actuarial Science, Soongsil University, Seoul, Korea
Corresponding author: Kyungdo Han https://orcid.org/0000-0002-6096-1263 Department of Statistics and Actuarial Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Korea E-mail: hkd917@naver.com
Received 2024 December 3; Accepted 2024 December 24.

Abstract

The Korean National Health Information Database (NHID), which contains nationwide real-world claims data including sociodemographic data, health care utilization data, health screening data, and healthcare provider information, is a powerful resource to test various hypotheses. It is also longitudinal in nature due to the recommended health checkup every 2 years and is appropriate for long-term follow-up study as well as evaluating the relationships between health outcomes and changes in parameters such as lifestyle factors, anthropometric measurements, and laboratory results. However, because these data are not collected for research purposes, precise operational definitions of diseases are required to facilitate big data analysis using the Korean NHID. In this review, we describe the characteristics of the Korean NHID, operational definitions of diseases used for research related to diabetes, and introduce representative research for diabetes-related diseases using the Korean NHID.

KEY FIGURE

Highlights

• The Korean NHID is a powerful resource to test various hypotheses.

• The Korean NHID is longitudinal in nature and is appropriate for long-term follow-up study.

• Research papers addressing diabetes using the Korean NHID has steadily increased.

• Precise operational definitions are required to facilitate big data analysis.

INTRODUCTION

In clinical research, the importance of real-world evidence obtained by big data analysis, as well as by randomized controlled trials, is increasing day by day. Big data analyses can provide research insights hard to come by using existing data sources and research methods. The Korean National Health Information Database (NHID) contains a wealth of information that can be analyzed using big data analyses methods, and numerous publications have used the information in this database [1-6]. In this review, we describe the characteristics of the Korean NHID, provide commonly used operational definitions of diabetes-related diseases, and introduce representative studies for diabetes-related diseases using the Korean NHID.

KOREAN NATIONAL HEALTH INFORMATION DATABASE

The Ministry of Health and Welfare supervises the National Health Insurance Service (NHIS) and Health Insurance Review & Assessment Service (HIRA) in Korea. The HIRA evaluates the adequacy of healthcare service costs by reviewing medical billing and claims and provides the review results to the NHIS and healthcare service providers [7,8]. The NHIS as a non-profit organization is the single insurer in Korea (Supplementary Fig. 1). Approximately 97% of the Korean population is subscribers to the Korean NHIS, while the remaining 3% of the population is covered by medical aid programs. The general health screening program involves examinations at least once every 2 years for the entire population of Korean adults aged 40 years or older. General health screening for regional household members and dependents was expanded to include people aged 20 years or older in 2019 [9].

The NHID, established in 2011, combines information obtained from the NHIS and health examinations [9]. It incorporates all data from the NHIS and is unique in that it includes health screening information, including detailed lifestyle questionnaires, laboratory results, and anthropometric measurements, which are not included in other claims databases. The NHID consists of five databases: an eligibility database, a national health screening database, a healthcare usage database, a long-term care insurance database, and a healthcare provider database [10].

A limitation of the NHID is that its data are not collected for research purposes. It is difficult to define causal relationships when performing outcome studies [9,10]. Furthermore, the NHID does not include data on medications or procedures not covered by the NHIS. There may be discrepancies between the diagnosis encoded for medical claims and the actual disease. To reduce inaccuracy, appropriate operational definitions and validation of these definitions are crucial. The biggest advantage of the Korean NHID is that it includes a very large number of individuals (approximately 50 million) and is representative of the entire Korean population [9,10]. In addition, the NHID includes health screening information as well as claims data and can be linked to mortality data from Statistics Korea. The NHID is longitudinal in nature and is also appropriate for long-term follow-up studies. Specifically, given that a health checkup is recommended once every 2 years for individuals aged 20 years or older, it can be used to identify health outcomes and their relationships to changes in various parameters including lifestyle factors, anthropometric measurements, and laboratory results.

RESEARCH TRENDS USING THE NATIONAL HEALTH INFORMATION DATABASE

Since the establishment of the NHID, research and publications based on this database have increased explosively. We searched PubMed using the keyword ‘Korean National Health Insurance’ and found more than 6,000 research papers published as of 2024 (Fig. 1). In 2008, 7.5% of research papers were found using the keywords ‘Korean National Health Insurance’ and ‘diabetes,’ but since then, research papers addressed diabetes have steadily increased to 19.5% (170/874) of all research papers using the NHID in 2023. As of the end of 2024, more than 1,000 articles were identified in PubMed using the keywords ‘Korean National Health Insurance’ and ‘diabetes.’ In the future, we hope that more studies addressed diabetes using the NHID aim to improve the welfare of the public by promoting public health, reducing medical costs, and guiding healthcare policies.

Fig. 1.

Number of publications using the Korean National Health Information Database from 2008 to 2023.

OPERATIONAL DEFINITIONS IN BIG DATA RESEARCH RELATED TO DIABETES

To perform appropriate big data analyses using the NHID, a precise operational definition of diseases is mandatory. Because there may also be a discrepancy between the actual disease and the diagnosis claimed by healthcare providers, International Classification of Diseases, 10th Revision (ICD-10) codes alone are not sufficient to define diseases appropriately [9,10]. To improve operational definitions, health screening results can be incorporated, data can be integrated with prescription records, diagnoses registered during admission can be limited, diagnoses requiring repeated outpatient visits can be considered, or a special registration code for payment reduction can be added.

For example, type 2 diabetes mellitus (T2DM) is usually defined by a fasting plasma glucose concentration ≥126 mg/dL or the presence of at least 1 prescription claim per year for antidiabetic drugs under ICD-10 codes E11–14 [11]. One study evaluated the validity and reliability of the NHIS data-based definition of T2DM by comparing it with data from another population-based database, the Korea National Health and Nutrition Examination Survey (KNHANES), as a standard reference [12]. In the study population (n=13,006), two algorithms were used to determine whether the diagnostic claim codes for T2DM in the NHIS dataset were accompanied by prescription codes for antidiabetic drugs (algorithm 1) or not (algorithm 2). Although both algorithms showed good reliability in defining T2DM, the accuracy (0.93 vs. 0.89) and specificity (0.96 vs. 0.90) tended to be higher for algorithm 1 than algorithm 2. This study showed that population-based NHIS claims data can be useful in identifying subjects with T2DM using diagnostic and prescription codes as diagnostic criteria. Commonly used operational definitions of diseases related to diabetes are summarized in Table 1.

Operational definitions of diseases related to diabetes

REPRESENTATIVE RESEARCH RELATED TO DIABETES USING THE NATIONAL HEALTH INFORMATION DATABASE

Gestational diabetes mellitus

The prevalence of gestational diabetes mellitus (GDM) may be overestimated in Korea as many clinicians input related codes to avoid pushback by the NHIS when they prescribe an oral glucose tolerance test [13]. To overcome this bias, patients with GDM have been defined as those who visited the outpatient clinic more than twice with GDM codes (O24.4 or O24.9) (Table 1) [14]. As a result, the prevalence of GDM in Korean women was 12.70% overall in 2011 to 2015. Advanced maternal age, pre-pregnancy body mass index, waist circumference, fasting plasma glucose, high income, smoking, and drinking were associated with an increased risk for GDM [14].

Fatty liver disease

Hepatic steatosis should be diagnosed histologically or by imaging, but that information is usually unavailable in nationwide large-scale databases. Fatty liver index (FLI) as an alternative to ultrasound or liver biopsy is a simple and accurate surrogate marker of hepatic steatosis and has been validated in many studies [15-17]. FLI was used to define hepatic steatosis and was calculated using the following equation: (e[0.953×ln(TG)+0.139×body mass index+0.718×ln(GGT)+0.053×waist circumference–15.745])/(1+e[0.953×ln(TG)+0.139×body mass index+0.718×ln(GGT)+0.053×waist circumference–15.745])×100 [15]. In Western populations, FLI ≥60 accurately identifies the presence of hepatic steatosis [15], but FLI ≥30 has been validated for hepatic steatosis in the general population of Korea [18]. Using the Korea NHID, the risk of nonalcoholic fatty liver disease (NAFLD) for cardiovascular disease and all-cause death in patients with T2DM was evaluated [19]. NAFLD was defined as the presence of hepatic steatosis without viral hepatitis or excessive alcohol consumption (≥30 g/day). Patients were divided into the following three groups: no NAFLD: FLI <30; grade 1 (G1) NAFLD: 30≤ FLI <60; and grade 2 (G2) NAFLD: FLI ≥60. NAFLD in patients with T2DM was associated with a higher risk of cardiovascular disease (myocardial infarction or ischemic stroke) (Table 1) and all-cause death, even in patients with mild liver disease. Risk differences for cardiovascular disease and all-cause death between the no NAFLD group and the grade 1 or grade 2 NAFLD groups were higher in patients with T2DM than in those without T2DM. A study that applied FLI ≥30 as fatty liver showed that subjects with mixed-etiology metabolic dysfunction-associated fatty liver disease (MAFLD) had an approximately 1.3-fold increased risk of cancer incidence and a 1.5-fold higher risk of cancer mortality than those without MAFLD, whereas those with single-etiology MAFLD only had modestly increased risks [20]. In addition, NAFLD (defined as FLI ≥30) was associated with an increased risk of young-onset stomach, colorectal, liver, pancreatic, biliary tract, and gallbladder cancers among more than 5 million individuals aged 20 to 39 years [21].

Variability in body weight and glucose as risk factors for various diseases

Because all enrollees in the Korean NHIS are advised to receive medical checkups every 2 years, the intraindividual visit-to-visit variability in glucose levels and body weight can be calculated using values obtained from serial health examinations [9,10]. Variability independent of the mean, coefficient of variation, and average real variability are representative indices of variability [22]. Body weight variability was associated with increased risks of myocardial infarction, stroke, and all-cause mortality in patients with T2DM and was a predictor of cardiovascular outcomes [23]. In a study investigating the association of body weight or glucose variability or their combination with the risk of hip fracture in people with diabetes, the risk was approximately 30% higher in the groups with high variability in body weight or glucose than in the group without high variability [24]. In addition, combined high variability of body weight and glucose level had an additive effect with a greater than 60% higher risk of hip fracture. In patients with predialysis chronic kidney disease, higher body mass index variability was significantly associated with higher risks of all-cause mortality, myocardial infarction, stroke, and progression to need for kidney replacement [25].

Body weight change as risk factor for various diseases

Although obesity is a proven risk factor for cardiovascular disease and metabolic diseases, there are limited data on the associations between weight change and those health outcomes. Body weight change has been calculated as the difference in body weight between the first and second general health checkups in the NHID [9,10]. Patients were categorized into five groups according to body weight change: severe weight loss (weight change ≤–10%), moderate weight loss (weight change of –10% to ≤–5%), stable weight (weight change of –5% to ≤5%), moderate weight gain (weight change of 5% to ≤10%), and severe weight gain (weight change >10%) [26,27]. A study exploring the association of weight change with the risk of dementia in patients newly diagnosed with T2DM demonstrated showed a significant U-shaped association with the risk of all-cause dementia (Table 1) [26]. Body weight change >10% was significantly associated with an increased risk of all-cause dementia. In addition, weight loss >10% was significantly associated with an increased risk of Alzheimer disease [26]. In a study of 1,522,241 patients with T2DM, a U-shaped association was found between body weight change and major cardiovascular event risks such as myocardial infarction, ischemic stroke, atrial fibrillation, heart failure, and all-cause death [27].

Metabolic syndrome

Anthropometric measurements and laboratory data from the NHIS health checkup and claims data include components of metabolic syndrome (waist circumference, triglycerides, high-density lipoprotein cholesterol, blood pressure, and fasting plasma glucose), which can be diagnosed using these data [9,10]. Among 8,320 earlier-onset colorectal cancer cases, metabolic syndrome and obesity were positively associated with earlier-onset colorectal cancer, particularly in the distal colon and rectum, but not the proximal colon [28]. Individuals who recovered from metabolic syndrome were shown to have a higher risk of pancreatic cancer than those free of metabolic syndrome but a lower risk than those with persistent metabolic syndrome [29]. Furthermore, in young men, development of metabolic syndrome was associated with increased risk of incident gout, and recovery from metabolic syndrome was associated with reduced risk of incident gout [30].

Lifestyle behavior

In the NHIS health screening, physical activity is measured using the International Physical Activity Questionnaire developed by the World Health Organization [31,32]. The questionnaire includes exercise intensity, duration, and frequency per week. One study defined regular physical activity as ≥30 minutes of moderate physical activity at least five times per week or ≥20 minutes of vigorous physical activity at least three times per week [33]. This study reported that regular physical activity was independently associated with lower risks of all-cause dementia, Alzheimer disease, and vascular dementia among participants with new-onset T2DM [33]. The interval change in regular physical activity can also be determined using consecutive health screenings [9,10]. When regular exercise was defined as moderate intensity exercise for more than 30 minutes or vigorous intensity exercise for more than 20 minutes at least once a week, starting exercise, maintaining exercise, and even cessation of exercise after thyroidectomy for treatment of thyroid cancer were associated with a lower risk of incident T2DM [34]. In addition, starting and maintaining regular physical activity were both associated with lower risk of incident atrial fibrillation in patients with T2DM (Table 1) [35].

Information on the frequency of alcohol intake per week and the amount of alcohol consumed per drinking episode is collected during the biannual health checkup using a self-administered questionnaire and is included in the Korean NHIS– Health Screening Cohort database [9,10]. In one study, subjects were classified into one of three groups based on average amount of alcohol intake per day: (1) no alcohol consumption (0 g/day); (2) mild alcohol consumption (<20 g/day); and (3) moderate to heavy consumption (≥20 g/day). Alcohol abstainers, constant drinkers, and nondrinkers were also defined to evaluate the impact of alcohol behavioral changes on various outcomes. This study found that 1,112,682 patients newly diagnosed with T2DM that abstained from alcohol had a low risk of atrial fibrillation (Table 1) [36].

Income status and diabetes

Previous studies have suggested that low socioeconomic status may contribute to a poor T2DM prognosis and an increased risk of mortality [37,38]. However, it is not easy to evaluate the effects of socioeconomic status on various health problems since many databases do not contain data on income or only include baseline data. In the Korean NHIS, household income is evaluated using health insurance premiums because the NHIS does not provide actual household income data [9]. Monthly health insurance premiums that are determined by wages and property do not change throughout a 1-year period unless an extreme income change occurs and are divided into 20 groups. To investigate whether income status was associated with various health outcomes, individuals’ baseline income status was categorized into quartiles from 1 (lowest) to 4 (highest), and changes in income status were compared between the first assessment and the last. A study that included approximately 7.8 million Korean adults found that sustained low income and decreases in income were associated with elevated T2DM risk, whereas a sustained high income was associated with lower T2DM risk (Table 1) [39]. In addition, sustained low-income status and declines in income were associated with increased risk of mortality in a study of >1.9 million adults with T2DM [40]. Higher income variability and sustained low income over 5 years were associated with increased cardiovascular disease risk in 1,528,108 adults aged 30 to 64 years with T2DM and no history of cardiovascular disease [41].

All-cause and cause-specific mortality risks in diabetes

The NHID has information on death because it is linked to death certificates from Statistics Korea regarding cause of death and date. The cause of death can be identified based on ICD-10 codes and specific causes of death can be classified as cardiovascular (code I), neoplasm (code C), respiratory (code J), infectious (code A and B), and so on. In a study with nearly 2 million patients with T2DM, hepatic steatosis and advanced fibrosis were significantly associated with risks of all-cause and cause-specific mortality including cardiovascular, cancer, respiratory and liver disease [42]. In addition, individuals with diabetes living alone (IDLA) were at a 20% higher risk of all-cause mortality compared to those not living alone in the study with nearly 2.5 million individuals with diabetes [43]. The risks of mortality from cardiovascular disease, cancer, respiratory disease, infectious disease, and other causes were all significantly higher in the IDLA group by 7% to 33% compared with the non-IDLA group.

Cancer risk and diabetes

A cancer case can be defined as the presence of an ICD-10 code of ‘C’ and an admission history with the cancer code as the principal diagnosis using the NHID. In patients with diabetes, the risks of many cancers are increased, through increased endogenous insulin levels resulting from insulin resistance, hyperglycemia, chronic inflammation, and increased oxidative stress [44]. A study enrolled a total of 25,709,497 patients showed that the risk for stomach, colorectal, liver, pancreas, and kidney cancer appeared to be higher in patients with diabetes than in those without diabetes regardless of the sex or duration of diabetes [45]. In the study of over 9 million individuals, even light-to-moderate alcohol consumption was associated with an increased risk of biliary tract cancer in individuals with prediabetes and diabetes, but not in normoglycemic individuals [46].

CONCLUSIONS

The Korean NHID contains nationwide claims data including sociodemographic data, health care utilization data, health screening data, and healthcare provider information, representing an attractive research resource for real-world data. The database has been used extensively in clinical and public health research related to diabetes. Advantages of the NHID include its longitudinal nature and the ability to evaluate associations between health outcomes and changes in lifestyle factors, anthropometric measurements, and laboratory results because of the recommendation of a health checkup every 2 years in Korea. To improve the quality of research using the NHID, it is important to understand the characteristics, design research accordingly, and clarify operational definitions of diseases. We are optimistic that research using the NHID will help advance medicine and improve human health.

SUPPLEMENTARY MATERIALS

Supplementary materials related to this article can be found online at https://doi.org/10.4093/dmj.2024.0780.

Supplementary Fig. 1.

Operational structure of the National Health Insurance System (NHIS). Reproduced from Kim et al. [21]. HIRA, Health Insurance Review & Assessment Service.

dmj-2024-0780-Supplementary-Fig-1.pdf

Notes

CONFLICTS OF INTEREST

Kyung-Soo Kim has been associate editor of the Diabetes & Metabolism Journal since 2024. He was not involved in the review process of this article. Otherwise, there was no conflict of interest.

FUNDING

None

Acknowledgements

None

References

1. Kim JH, Lee J, Han K, Kim JT, Kwon HS, ; Diabetic Vascular Disease Research Group of the Korean Diabetes Association. Cardiovascular disease & diabetes statistics in Korea: nationwide data 2010 to 2019. Diabetes Metab J 2024;48:1084–92.
2. Han E, Han KD, Lee YH, Kim KS, Hong S, Park JH, et al. Fatty liver & diabetes statistics in Korea: nationwide data 2009 to 2017. Diabetes Metab J 2023;47:347–55.
3. Kim NH, Seo MH, Jung JH, Han KD, Kim MK, Kim NH. 2023 Diabetic kidney disease fact sheet in Korea. Diabetes Metab J 2024;48:463–72.
4. Bae JH, Han KD, Ko SH, Yang YS, Choi JH, Choi KM, et al. Diabetes fact sheet in Korea 2021. Diabetes Metab J 2022;46:417–26.
5. Kim HC, Lee H, Lee HH, Son D, Cho M, Shin S, et al. Korea hypertension fact sheet 2023: analysis of nationwide population-based data with a particular focus on hypertension in special populations. Clin Hypertens 2024;30:7.
6. Jin ES, Shim JS, Kim SE, Bae JH, Kang S, Won JC, et al. Dyslipidemia fact sheet in South Korea, 2022. Diabetes Metab J 2023;47:632–42.
7. Kim HK, Song SO, Noh J, Jeong IK, Lee BW. Data configuration and publication trends for the Korean National Health Insurance and Health Insurance Review & Assessment Database. Diabetes Metab J 2020;44:671–8.
8. Choi EK. Cardiovascular research using the Korean National Health Information Database. Korean Circ J 2020;50:754–72.
9. Kim MK, Han K, Lee SH. Current trends of big data research using the Korean National Health Information Database. Diabetes Metab J 2022;46:552–63.
10. Cho SW, Kim JH, Choi HS, Ahn HY, Kim MK, Rhee EJ. Big data research in the field of endocrine diseases using the Korean National Health Information Database. Endocrinol Metab (Seoul) 2023;38:10–24.
11. Lee YH, Han K, Ko SH, Ko KS, Lee KU, ; Taskforce Team of Diabetes Fact Sheet of the Korean Diabetes Association. Data analytic process of a nationwide population-based study using National Health Information Database established by National Health Insurance Service. Diabetes Metab J 2016;40:79–82.
12. Baek JH, Park YM, Han KD, Moon MK, Choi JH, Ko SH. Comparison of operational definition of type 2 diabetes mellitus based on data from Korean National Health Insurance Service and Korea National Health and Nutrition Examination Survey. Diabetes Metab J 2023;47:201–10.
13. Yoo HJ, Choi KM, Baik SH, Park JH, Shin SA, Hong SC, et al. Influences of body size phenotype on the incidence of gestational diabetes needing prescription; analysis by Korea National Health Insurance (KNHI) claims and the National Health Screening Examination (NHSE) database. Metabolism 2016;65:1259–66.
14. Kim KS, Hong S, Han K, Park CY. The clinical characteristics of gestational diabetes mellitus in Korea: a National Health Information Database Study. Endocrinol Metab (Seoul) 2021;36:628–36.
15. Bedogni G, Bellentani S, Miglioli L, Masutti F, Passalacqua M, Castiglione A, et al. The fatty liver index: a simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol 2006;6:33.
16. Cuthbertson DJ, Weickert MO, Lythgoe D, Sprung VS, Dobson R, Shoajee-Moradie F, et al. External validation of the fatty liver index and lipid accumulation product indices, using 1H-magnetic resonance spectroscopy, to identify hepatic steatosis in healthy controls and obese, insulin-resistant individuals. Eur J Endocrinol 2014;171:561–9.
17. Huang X, Xu M, Chen Y, Peng K, Huang Y, Wang P, et al. Validation of the fatty liver index for nonalcoholic fatty liver disease in middle-aged and elderly Chinese. Medicine (Baltimore) 2015;94e1682.
18. Cho EJ, Jung GC, Kwak MS, Yang JI, Yim JY, Yu SJ, et al. Fatty liver index for predicting nonalcoholic fatty liver disease in an asymptomatic Korean population. Diagnostics (Basel) 2021;11:2233.
19. Kim KS, Hong S, Han K, Park CY. Association of non-alcoholic fatty liver disease with cardiovascular disease and all cause death in patients with type 2 diabetes mellitus: nationwide population based study. BMJ 2024;384e076388.
20. Chung GE, Yu SJ, Yoo JJ, Cho Y, Lee KN, Shin DW, et al. Differential risk of 23 site-specific incident cancers and cancer-related mortality among patients with metabolic dysfunction-associated fatty liver disease: a population-based cohort study with 9.7 million Korean subjects. Cancer Commun (Lond) 2023;43:863–76.
21. Park JH, Hong JY, Shen JJ, Han K, Park JO, Park YS, et al. Increased risk of young-onset digestive tract cancers among young adults age 20-39 years with nonalcoholic fatty liver disease: a nationwide cohort study. J Clin Oncol 2023;41:3363–73.
22. Kim MK, Han K, Park YM, Kwon HS, Kang G, Yoon KH, et al. Associations of variability in blood pressure, glucose and cholesterol concentrations, and body mass index with mortality and cardiovascular outcomes in the general population. Circulation 2018;138:2627–37.
23. Nam GE, Kim W, Han K, Lee CW, Kwon Y, Han B, et al. Body weight variability and the risk of cardiovascular outcomes and mortality in patients with type 2 diabetes: a nationwide cohort study. Diabetes Care 2020;43:2234–41.
24. Lee J, Han K, Park SH, Kim MK, Lim DJ, Yoon KH, et al. Associations of variability in body weight and glucose levels with the risk of hip fracture in people with diabetes. Metabolism 2022;129:155135.
25. Park S, Cho S, Lee S, Kim Y, Park S, Kim YC, et al. The prognostic significance of body mass index and metabolic parameter variabilities in predialysis CKD: a nationwide observational cohort study. J Am Soc Nephrol 2021;32:2595–612.
26. Nam GE, Park YG, Han K, Kim MK, Koh ES, Kim ES, et al. BMI, weight change, and dementia risk in patients with new-onset type 2 diabetes: a nationwide cohort study. Diabetes Care 2019;42:1217–24.
27. Park CS, Choi YJ, Rhee TM, Lee HJ, Lee HS, Park JB, et al. U-shaped associations between body weight changes and major cardiovascular events in type 2 diabetes mellitus: a longitudinal follow-up study of a nationwide cohort of over 1.5 million. Diabetes Care 2022;45:1239–46.
28. Jin EH, Han K, Lee DH, Shin CM, Lim JH, Choi YJ, et al. Association between metabolic syndrome and the risk of colorectal cancer diagnosed before age 50 years according to tumor location. Gastroenterology 2022;163:637–48.
29. Park JH, Han K, Hong JY, Park YS, Hur KY, Kang G, et al. Changes in metabolic syndrome status are associated with altered risk of pancreatic cancer: a nationwide cohort study. Gastroenterology 2022;162:509–20.
30. Eun Y, Han K, Lee SW, Kim K, Kang S, Lee S, et al. Altered risk of incident gout according to changes in metabolic syndrome status: a nationwide, population-based cohort study of 1.29 million young men. Arthritis Rheumatol 2023;75:806–15.
31. Cleland C, Ferguson S, Ellis G, Hunter RF. Validity of the International Physical Activity Questionnaire (IPAQ) for assessing moderate-to-vigorous physical activity and sedentary behaviour of older adults in the United Kingdom. BMC Med Res Methodol 2018;18:176.
32. Ahn HJ, Lee SR, Choi EK, Han KD, Jung JH, Lim JH, et al. Association between exercise habits and stroke, heart failure, and mortality in Korean patients with incident atrial fibrillation: a nationwide population-based cohort study. PLoS Med 2021;18e1003659.
33. Yoo JE, Han K, Kim B, Park SH, Kim SM, Park HS, et al. Changes in physical activity and the risk of dementia in patients with new-onset type 2 diabetes: a nationwide cohort study. Diabetes Care 2022;45:1091–8.
34. Park J, Jung JH, Park H, Song YS, Kim SK, Cho YW, et al. Association between exercise habits and incident type 2 diabetes mellitus in patients with thyroid cancer: nationwide population-based study. BMC Med 2024;22:251.
35. Park CS, Choi EK, Han KD, Yoo J, Ahn HJ, Kwon S, et al. Physical activity changes and the risk of incident atrial fibrillation in patients with type 2 diabetes mellitus: a nationwide longitudinal follow-up cohort study of 1.8 million subjects. Diabetes Care 2023;46:434–40.
36. Choi YJ, Han KD, Choi EK, Jung JH, Lee SR, Oh S, et al. Alcohol abstinence and the risk of atrial fibrillation in patients with newly diagnosed type 2 diabetes mellitus: a nationwide population-based study. Diabetes Care 2021;44:1393–401.
37. Lysy Z, Booth GL, Shah BR, Austin PC, Luo J, Lipscombe LL. The impact of income on the incidence of diabetes: a population-based study. Diabetes Res Clin Pract 2013;99:372–9.
38. Kim SR, Han K, Choi JY, Ersek J, Liu J, Jo SJ, et al. Age- and sexspecific relationships between household income, education, and diabetes mellitus in Korean adults: the Korea National Health and Nutrition Examination Survey, 2008-2010. PLoS One 2015;10e0117034.
39. Park JC, Nam GE, Yu J, McWhorter KL, Liu J, Lee HS, et al. Association of sustained low or high income and income changes with risk of incident type 2 diabetes among individuals aged 30 to 64 years. JAMA Netw Open 2023;6e2330024.
40. Lee HS, Park JC, Chung I, Liu J, Lee SS, Han K. Sustained low income, income changes, and risk of all-cause mortality in individuals with type 2 diabetes: a nationwide population-based cohort study. Diabetes Care 2023;46:92–100.
41. Park YM, Baek JH, Lee HS, Elfassy T, Brown CC, Schootman M, et al. Income variability and incident cardiovascular disease in diabetes: a population-based cohort study. Eur Heart J 2024;45:1920–33.
42. Chung GE, Jeong SM, Cho EJ, Yoon JW, Yoo JJ, Cho Y, et al. The association of fatty liver index and BARD score with all-cause and cause-specific mortality in patients with type 2 diabetes mellitus: a nationwide population-based study. Cardiovasc Diabetol 2022;21:273.
43. Yun JS, Han K, Kim B, Ko SH, Kwon HS, Ahn YB, et al. All-cause and cause-specific mortality risks in individuals with diabetes living alone: a large-scale population-based cohort study. Diabetes Res Clin Pract 2024;217:111876.
44. Vigneri P, Frasca F, Sciacca L, Pandini G, Vigneri R. Diabetes and cancer. Endocr Relat Cancer 2009;16:1103–23.
45. Kim SK, Jang JY, Kim DL, Rhyu YA, Lee SE, Ko SH, et al. Site-specific cancer risk in patients with type 2 diabetes: a nationwide population-based cohort study in Korea. Korean J Intern Med 2020;35:641–51.
46. Park JH, Hong JY, Han K, Park YS, Park JO. Light-to-moderate alcohol consumption increases the risk of biliary tract cancer in prediabetes and diabetes, but not in normoglycemic status: a nationwide cohort study. J Clin Oncol 2022;40:3623–32.

Article information Continued

Fig. 1.

Number of publications using the Korean National Health Information Database from 2008 to 2023.

Table 1.

Operational definitions of diseases related to diabetes

Disease Operational definition
Inclusion Exclusion
Type 1 diabetes mellitus More than 1 claim under ICD-10 code E10 and more than 3 claims for the prescription of insulin and more than 1 claim for the prescription of insulin between 365 and 730 days after the first prescription of insulin Individuals who had claims under ICD-10 codes E11–14 within 730 days after the first prescription of insulin or who underwent total or partial pancreatectomy
Type 2 diabetes mellitus Fasting plasma glucose concentration ≥126 mg/dL or the presence of at least 1 prescription claim per year for antidiabetic drugs under ICD-10 codes E11–14
Impaired fasting glucose Fasting plasma glucose concentration ≥100 and <126 mg/dL Individuals who had a claim for diabetes mellitus based on ICD-10 codes (E10–E14) or oral antidiabetic drugs or insulin use
Gestational diabetes mellitus Patients who visited the outpatient clinic more than 2 times with gestational diabetes mellitus codes (O24.4 or O24.9) Individuals who had a claim for diabetes mellitus based on ICD-10 codes (E10–E14) or oral antidiabetic drugs or insulin use status before pregnancy or with a fasting plasma glucose level of 126 mg/dL or greater before pregnancy
Diabetic retinopathy, proliferative ICD-10 code H360 during admission or outpatient department ≥1 with procedural code(s) for panretinal photocoagulation (S5160 or S5161)
Diabetic retinopathy, non-proliferative ICD-10 code H360 during admission or outpatient department ≥1 Patients with a procedural code or codes for panretinal photocoagulation (S5160 or S5161)
Diabetic nephropathy ICD-10 codes N18, N19, Z49, Z905, Z94, Z992 plus presence of diabetes mellitus with any of the 4 conditions: diagnostic code during admission or outpatient department ≥1; procedural code for renal transplantation (R3280); procedural code(s) for hemodialysis (O7011-7020); or procedural code(s) for peritoneal dialysis (O7071-7075)
Diabetic neuropathy ICD-10 codes E10.4, E11.4, E12.4, E13.4, E14.4, G59.0, G63.2, G99.0 during admission or outpatient department ≥1
Diabetic foot with amputation ICD-10 codes E105, E107, E115, E117, E125, E127, E135, E137, E145, E147 during admission or outpatient department ≥1 with procedural code(s) for amputation (N0572-0575)
Diabetic foot without amputation ICD-10 codes E105, E107, E115, E117, E125, E127, E135, E137, E145, E147 during admission or outpatient department ≥1 Patients with a procedural code for amputation (N0572-0575)
Hypertension ICD-10 codes I10-13 or I15 and at least 1 prescription claim per year for antihypertensive drugs or blood pressure ≥140/90 mm Hg
Dyslipidemia ICD-10 code E78 and at least 1 claim per year for prescription of a lipid lowering agent or a total cholesterol level ≥240 mg/dL
End-stage renal disease ICD‐10 codes (N18–N19, Z49, Z94.0, Z99.2) and initiation of renal replacement therapy for 30 days or more and/or kidney transplantation during hospitalization
Ischemic heart disease ICD-10 codes I20–25 with associated hospitalization
Myocardial infarction ICD-10 code I21 or I22 during hospitalization
Ischemic stroke ICD-10 code I63 or I64 during hospitalization with claims for brain magnetic resonance imaging or computed tomography
Peripheral artery disease ICD-10 codes I70 or I73 and procedural codes HA633, HA651, HA652, M6597, M6605, M6612, M6613, M6620, M6632, M6633, N0571, N0572, N0573, N0574, N0575, O0161, O0162, O0163, O0164, O0165, O0166, O0167, O0168, O0169, O0170, O1710, O1711, O1643, O1644,
Heart failure O1645, or O1646 during hospitalization
Atrial fibrillation ICD code I50 and hospitalization
Percutaneous coronary ICD codes I48.0–I48.4 or I48.9 with at least 1 admission or 2 outpatient visits
intervention Procedural codes M6551, M6552, M6553, M6554, M6561, M6562, M6563, M6564, M6565, M6566, M6567, M6571, or M6572
Coronary artery bypass graft Procedural codes O1642, OA642, O1640, O1641, O1647, O1648, O1649, OA640, OA641, OA647, OA648, or OA649
Dementia ICD-10 codes (F00 or G30 for Alzheimer disease; F01 for vascular dementia; and F02, F03, G23.1, or G31 for other dementia) with the prescription of 2 or more medications (donepezil, memantine, rivastigmine, galantamine) for dementia

ICD-10, International Classification of Diseases, 10th Revision.