Diabetes Metab J > Volume 47(2); 2023 > Article
Baek, Park, Han, Moon, Choi, and Ko: Comparison of Operational Definition of Type 2 Diabetes Mellitus Based on Data from Korean National Health Insurance Service and Korea National Health and Nutrition Examination Survey



We evaluated the validity and reliability of the operational definition of type 2 diabetes mellitus (T2DM) based on the Korean National Health Insurance Service (NHIS) database.


Adult subjects (≥40 years old) included in the Korea National Health and Nutrition Examination Survey (KNHANES) from 2008 to 2017 were merged with those from the NHIS health check-up database, producing a cross-sectional dataset. We evaluated the sensitivity, specificity, accuracy, and agreement of the NHIS criteria for defining T2DM by comparing them with the KNHANES criteria as a standard reference.


In the study population (n=13,006), two algorithms were devised to determine from the NHIS dataset whether the diagnostic claim codes for T2DM were accompanied by prescription codes for anti-diabetic drugs (algorithm 1) or not (algorithm 2). Using these algorithms, the prevalence of T2DM was 14.9% (n=1,942; algorithm 1) and 20.8% (n=2,707; algorithm 2). Good reliability in defining T2DM was observed for both algorithms (Kappa index, 0.73 [algorithm 1], 0.63 [algorithm 2]). However, the accuracy (0.93 vs. 0.89) and specificity (0.96 vs. 0.90) tended to be higher for algorithm 1 than for algorithm 2. The validity (accuracy, ranging from 0.91 to 0.95) and reliability (Kappa index, ranging from 0.68 to 0.78) of defining T2DM by NHIS criteria were independent of age, sex, socioeconomic status, and accompanied hypertension or dyslipidemia.


The operational definition of T2DM based on population-based NHIS claims data, including diagnostic codes and prescription codes, could be a valid tool to identify individuals with T2DM in the Korean population.

Graphical abstract


The prevalence of type 2 diabetes mellitus (T2DM) has increased worldwide, and diabetes itself is closely related to an increased risk of atherosclerotic cardiovascular diseases such as myocardial infarction and ischemic stroke, as well as mortality. As a result, population-based data have been widely used in epidemiologic studies [1-3] to identify individuals with diabetes and evaluate diabetes-related comorbidities and risk factors. The population-level classification of T2DM can also provide informative data to guide and prioritize populations at the greatest risk and those most likely to benefit from interventions and treatment. However, there is a limitation in the population-based claim database (DB) because accurate diagnoses cannot be made due to limited clinical and laboratory information, despite the advantage of the vast amount of data.
In Korea, two representative population-based DBs have been used, the Korea National Health and Nutrition Examination Survey (KNHANES) DB, with a cross-sectional design, and the National Health Insurance Service (NHIS) DB, with a national claims DB cohort design [4]. The Korean NHIS, a single-payer system for all residents, covers 97.1% of Koreans (approximately 50 million individuals), and this DB could be an efficient resource for diabetes research based on the entire population [5]. These big DBs have different advantages and disadvantages, depending on their characteristics.
Clinical measures, including glycosylated hemoglobin (HbA1c) and the oral glucose tolerance test (OGTT), are the gold standards for diagnosing diabetes [6]. However, it is difficult to routinely conduct an HbA1c test or OGTT in a study involving an entire population, especially for subjects with mild hyperglycemia. Instead, an operational definition was adopted to define diabetes using claims-based data and national health examination data in the NHIS DB. Generally, T2DM can be defined as the assignment of an International Classification of Disease, 10th Revision (ICD-10) code corresponding to T2DM (E11-14), with or without accompanying prescription codes for anti-diabetic drugs, or a high fasting glucose level (≥126 mg/dL) in the health check-up DB [7]. However, different operational definition criteria for diabetes were adopted for previous studies, depending on whether the diagnosis was based only on the corresponding ICD-10 codes [8,9], the use of concomitant drugs prescription were included [10-15], or fasting glucose results were included [16,17].
Whether the accuracy of defining diabetes based on claims data using diagnostic codes (ICD-10) with or without prescription codes (anti-diabetic drug use) is consistent with actual diabetes in the real-world is unknown. The quality of data must first be evaluated for fitness for use. Previous validation studies were performed based on comparisons with self-reports, telephone-based surveys, or medical chart reviews [18]. These methods may include biases, such as recall bias and selection bias, that affect accuracy and concordance. Our study aimed to evaluate the validity and reliability of the NHIS data-based definition of T2DM by comparing it with other population-based KNHANES data as a standard reference. The overall sensitivity, specificity, positive and negative predictive value, accuracy, and agreement were analyzed. We also compared the prevalence and concordance of T2DM when the two algorithms were applied, depending on whether the prescription codes and diagnostic codes were included in the criteria. To the best of our knowledge, this was the first study to validate the operational definition of T2DM using two big, linked Korean national DBs.


The Institutional Review Board of The Catholic University of Korea (IRB No.: VC18FESI0240) approved this study. The study was conducted in compliance with the Declaration of Helsinki. Written informed consent by the subjects was waived due to a retrospective nature of our study and anonymous and de-identified information was used for analysis.

Data sources

The Korean NHIS program is a computerized DB containing all claims data, including patient demographics, drug prescriptions, diagnostic codes for the disease coding system (ICD), insurers’ payment coverage, patients’ deductions, and claimed treatment details [7]. Among the total datasets in the NHIS DB, qualifications, claims, health check-up DB, and death information were used. We investigated whether there were fasting glucose levels in the health check-up DB and whether there were ICD-10 codes corresponding to T2DM and claimed prescription data for anti-diabetic drugs in the Korean Health Insurance Review and Assessment. All Korean citizens are encouraged to receive regular biannual or pre-employment health evaluations provided by NHIS. This regular health examination included assessments of anthropometric measures, blood pressure, social history, physical activity levels, and laboratory tests after overnight fasting, including serum glucose, total cholesterol, creatinine, liver function, and urinalysis.
KNHANES is a population-based cross-sectional survey designed to assess Koreans’ health-related behavior, health conditions, and nutritional status [19]. A retrospective sample of non-institutionalized civilians was obtained from all geographic regions in the country. In the KNHANES data, we analyzed the laboratory test results (fasting glucose and HbA1c levels) and collected responses to a questionnaire on whether the people included took anti-diabetic drugs or were diagnosed with T2DM. Among the eight phases of the KNHANES, data from the IV to VII phases (2008 to 2017) were analyzed, and adults over 40 years old were included in the study. The subjects surveyed by the KNHANES each year were matched to the first claims data in the NHIS health check-up DB.
We identified a cohort of 39,701 subjects in the KNHANES from 2008 to 2017. Subjects who had no data on glucose levels in the medical check-up DB or did not undergo blood tests in a fasting state (for more than 8 hours) were excluded (n=1,598). Among them, 14,294 subjects in the NHIS health check-up DB matched those in the KNHANES. Finally, 13,006 subjects were included in the study, excluding those missing values for age, sex, body mass index, household income, alcohol or smoking status, regular exercise, or the presence of dyslipidemia, hypertension, or chronic kidney disease (CKD) in the KNHANES data (Fig. 1).

Definition of T2DM

According to the KNHANES, the presence of T2DM was defined if any of the following were present: (1) fasting glucose level of ≥126 mg/dL; (2) current use of any anti-diabetic medications; (3) a previous T2DM diagnosis; or (4) an HbA1c level of ≥6.5%. The use of medications and information on medical conditions were collected through the health interview questionnaire, using the face-to-face interview method [19]. According to the NHIS, T2DM was identified by the presence of at least one of these criteria: (1) fasting glucose level of ≥126 mg/dL in the health check-up DB or (2) the presence of ICD-10 codes corresponding to T2DM (E11-14) with or without accompanying prescription codes for any anti-diabetic drugs in the claims data. Concerning defining T2DM by the NHIS dataset, two algorithms based on claims data were applied, an algorithm for diagnosing T2DM when prescription codes were accompanied by diagnostic codes (algorithm 1) and an algorithm that only required diagnostic codes (algorithm 2).

Definition of hypertension, dyslipidemia, and socioeconomic variables

Variables were defined based on the KNHANES data. Hypertension was defined as a systolic blood pressure of ≥140 mm Hg or diastolic blood pressure of ≥90 mm Hg or taking anti-hypertensive drugs [20]. Dyslipidemia was defined as a total cholesterol level of ≥240 mg/dL or taking lipid-lowering drugs [21]. CKD was defined when the estimated glomerular filtration rate was <60 mL/min/1.73 m2 [22]. Information on household income was obtained through a questionnaire and dichotomized at the higher 25th percentile or divided into quartiles. Household income was calculated as an equivalent income by dividing monthly income into the square root of the family size. Alcohol intake was classified into three categories: never drinker, mild drinker (0 to 30 g/day), and heavy drinker (>30 g/day) [23]. The final education level was classified as elementary school graduation (education duration ≤6 years), middle school graduation (≤9 years), high school graduation (≤12 years), and university or higher (>12 years). When the education level was classified into two groups, they were classified as those who graduated from middle school or lower (education duration ≤9 years) and those who graduated from high school or higher (>9 years). Regular walking was defined as walking for at least 30 minutes per day at least five times a week [24].

Statistical methods

T2DM was classified based on whether it satisfied the diagnostic criteria of the NHIS and KHNANES, respectively. Accordingly, the subjects were divided into four subgroups (NHIS-/KNHANES-, NHIS+/KNHANES-, NHIS-/KNHANES+, and NHIS+/KNHANES+, where positivity indicated a case corresponding to T2DM according to the criteria used). We summarized the characteristics of the participants by the presence or absence of T2DM according to four groups. An independent t-test was conducted on the continuous variables, and a chi-squared test was conducted on the categorical variables. The validity of the NHIS definition of T2DM was measured by estimating the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy using the KNAHNES criteria as the standard. Accuracy was expressed as a proportion of correctly classified subjects (true positive and true negative) among all subjects [25]. The Kappa coefficient with corresponding 95% confidence intervals (CI) was also calculated to assess the reliability of the two diagnostic criteria for T2DM. In general, when the Kappa coefficient was larger than 0.8, there was excellent consistency, and when the Kappa value was between 0.6 and 0.8, there was good consistency [26]. Additionally, we evaluated whether there were differences in the agreement between the two T2DM criteria according to age, sex, household income, educational level, and the presence of hypertension or dyslipidemia. Data analysis was performed using SAS version 9.4 (SAS Institute, Cary, NC, USA).


The prevalence of T2DM according to operational definitions by the NHIS and KNHANES

The overall prevalence of T2DM satisfying KNHANES criteria was 14.2% (n=1,843). The prevalence of T2DM in the NHIS using algorithm 1 was 14.9% (n=1,942), and using algorithm 2, it was 20.8% (n=2,707) (Table 1). When classifying T2DM using the diagnostic criteria of the NHIS (algorithm 1) or KNHANES data, the prevalence of subjects who did not meet both the NHIS and KHNANES diagnostic criteria (true negative) was 82.1% (n=10,683); 381 subjects (2.9%) only met the KNHANES diagnostic criteria (false negative), 480 subjects met (3.7%) only the NHIS criteria (false positive), and 1,462 (11.2%) met both (true positive) (Table 2). When the condition of using an anti-diabetic drug was excluded from the NHIS criteria (algorithm 2), 10,025 (77.1%) subjects did not meet either set of criteria, 274 subjects (2.1%) met only the KHNANES diagnostic criteria, 1,138 (8.7%) met only the NHIS, and 1,569 (12.1%) met both criteria (Supplementary Table 1). According to algorithm 1, the subgroup that satisfied both criteria (NHIS+/KNHANES+) was older; had a higher proportion of male gender, hypertension, and CKD; higher HbA1c levels, and lower income and education levels than the subgroup that satisfied only one set of criteria (NHIS+/KNHANES-, NHIS-/KNHANES+) or were in the non-diabetic group (NHIS-/KNHANES-) (Table 2).

Concordance measures

The overall sensitivity, specificity, PPV, NPV, accuracy, and Kappa coefficient of the NHIS diagnostic criteria (algorithm 1) compared to the KNHANES criteria was 79% (95% CI, 77 to 81), 96% (95% CI, 95 to 96), 75% (95% CI, 73 to 77), 97% (95% CI, 96 to 97), 93% (95% CI, 93 to 94), and 0.73 (95% CI, 0.72 to 0.75), respectively. When algorithm 2 was adopted in the NHIS criteria, sensitivity, specificity, PPV, NPV, accuracy, and the Kappa coefficient were 85% (95% CI, 84 to 87), 90% (95% CI, 89 to 90), 58% (95% CI, 56 to 60), 97% (95% CI, 97 to 98), 89% (95% CI, 89 to 90), and 0.63 (95% CI, 0.61 to 0.64) (Fig. 2). The mean sensitivity (ranging from 73% to 83%), specificity (ranging from 93% to 97%), PPV (ranging from 67% to 82%), NPV (ranging from 94% to 98%), accuracy (ranging from 91% to 95%), and agreement (Kappa index, ranging from 0.68 to 0.78) of the NHIS definition criteria (algorithm 1) were not different by age, sex, income level, education status, and accompanied hypertension or dyslipidemia (Table 3).


Overall good validity and consistency of the diagnostic criteria using NHIS data were observed, which did not differ by age, sex, socioeconomic factors, or accompanied hypertension or dyslipidemia. When two diagnostic algorithms were applied to NHIS data according to whether the diagnostic codes were accompanied by prescription codes (algorithm 1) or not (algorithm 2), the prevalence of T2DM by algorithm 1 was lower than by algorithm 2, which was similar to the prevalence using the KNHANES data. In addition, although good reliability was observed for both algorithms, specificity and accuracy tended to increase in the algorithm that included both diagnostic and prescription codes (algorithm 1).
The prevalence of T2DM in the NHIS data using algorithm 1 (adopting both diagnostic and prescription codes) was lower, around 5.9% lower than when algorithm 2 (adopting only diagnostic codes) was applied. False positives (cases identified in NHIS claims data as having T2DM that were not diagnosed with T2DM by KNHANES criteria) increased when T2DM was defined only by diagnostic codes (8.7% in algorithm 2, 3.4% in algorithm 1). The overall prevalence of T2DM identified using algorithm 1 in this study was similar to the overall prevalence published in the 2021 Korea Diabetes Fact Sheet using KNHANES data (16.7%, approximately 6.05 million people) [15]. The mean HbA1c level in the NHIS+/KNHANES-group (false positives) was 5.9% using algorithm 1 and 5.7% using algorithm 2 in the study. There may be cases in which claims were issued for a T2DM diagnosis in subjects with prediabetes or early T2DM who did not require medications. Also, even though both algorithms (whether or not prescription claims data were included) provided good agreement based on the Kappa index, higher specificity, and accuracy for defining T2DM based on the NHIS were observed when claims for diagnostic codes were present along with prescription codes. When both diagnostic codes and prescription codes were included in the criteria for defining T2DM in the NHIS dataset, it helped to distinguish between patients who were in a prediabetic or early diabetic state and those who were in overt diabetes requiring treatment.
Concordance and the consistency of the diagnostic value based on NHIS criteria (algorithm 1) were not different according to age, sex, socioeconomic factors, and accompanied hypertension or dyslipidemia. The accuracy and specificity were over 90%, and the mean Kappa index showed good reliability (ranging from 0.68 to 0.78). These trends were consistent when algorithm 2 was applied (data not shown). A previous validation study compared accuracy and consistency using self-reports or telephone surveys as a reference standard [18]. Self-reports and telephone surveys are prone to recall bias, social desirability bias, poor understanding of the survey questions, incomplete knowledge, or their accurate diagnosis information. The literature review demonstrated that participants’ sociodemographic characteristics, such as age, gender, race, setting, and socioeconomic and health status, were associated with incomplete data linkage and the potential for systematic bias in reported outcomes [27]. Otherwise, our study used KNHANES data, a population-based surveillance system, as a reference standard. The KNHANES data has the advantage of minimizing selection bias compared to a diagnosis based on an electric medical chart review or interviews because the target population of the KNHANES comprises nationally representative non-institutionalized civilians in Korea. In addition, including clinical measures (HbA1c) as one of the diagnostic criteria in the reference standard for assessing validation can help overcome the potential limitation with systemic bias. Also, data linkage between the KNHANES and NHIS compensated for the shortcoming in the claims data, which was a lack of clinical information such as disease duration or glycemic control status, by adding information about self-reported surveys and urine or blood sample measurements in the KNHANES.
Validity of national claims administrative data was also evaluated in other countries such as Japan [28], Canada [29], and the USA [18]. Based on the Japanese national claim DB, the algorithm that contains both diagnosis-related codes for diabetes and medication codes had higher specificity (mean, 99.4% vs. 91.6%) and agreement (mean Kappa index, 0.80 vs. 0.49) than the algorithm that contains only diagnosis-related codes [28]. According to healthcare administrative data from Canada, compared with electronic medical records, the algorithm with the best specificity and PPV while maintaining sensitivity above 80% was either one hospitalization or physician claim and either one prescription for drug or diabetes-specific fee code at any time [29]. Validity of physician claims data-based on ICD-9 codes in the USA demonstrated that the sensitivity ranged from 26.9% to 97.0%, specificity ranged from 94.5% to 99.4%, and the Kappa index ranged from 0.8 to 0.9 [18]. Comparing the sensitivity, specificity, and Kappa agreement to other countries, the algorithm based on Korean NHIS data also demonstrated good validity and reliability.
Several limitations to this study should be considered. First, selection bias may have occurred because two-thirds of the subjects were excluded due to missing data on fasting glucose levels in the medical check-up DB or covariates in the KNHANES data, as well as cases where the person refused to provide personal information. In addition, only subjects aged 40 years or older were included in this study because national health check-up was conducted for 40 years or older. Second, among the diagnostic criteria in the KNHANES data used as a standard reference, questionnaires were also used to classify patients with T2DM through a self-reported survey. Other laboratory tests and data, such as the OGTT or hyperglycemia-accompanied symptoms, were not present in the data used to diagnose T2DM. As a result, the KNHANES data also did not fully reflect all patients with T2DM in real-world settings. Third, defining T2DM according to claims-based data can overlook patients with untreated diabetes or those who did not require treatment. Clinical factors such as disease duration, diabetes management status, or accompanied hypertension or dyslipidemia, were not assessed through the NHIS data. Despite these limitations, validating the operational diagnosis of T2DM by linking these two big national DBs, including clinical measures (HbA1c), represents a very important and timely investigation approach for future diabetes research in Korea.
In conclusion, population-based NHIS claims data can be useful in identifying subjects with T2DM by using diagnostic and prescription codes as diagnostic criteria in epidemiologic studies. The validity and accuracy of the population-based claims data for identifying T2DM were well documented and independent of sociodemographic and metabolic risk factors.


Supplementary materials related to this article can be found online at https://doi.org/10.4093/dmj.2022.0375.
Supplementary Table 1.
Characteristics of the study population classified as having type 2 diabetes mellitus according to KNHANES and NHIS criteria (algorithm 2)



Seung-Hyun Ko has been executive editor of the Diabetes & Metabolism Journal since 2022. Yong-Moon Park has been statistical advisor of the Diabetes & Metabolism Journal since 2021. They were not involved in the review process of this article. Otherwise, there was no conflict of interest.


Conception or design: J.H.B., K.D.H., S.H.K.

Acquisition, analysis, or interpretation of data: Y.M.P., K.D.H

Drafting or revising the work: J.H.B., M.K.M., J.H.C.

Final approval of the manuscript: K.D.H., S.H.K.


This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number: HI18-C0275).



Fig. 1.
Study diagram. KNHANES, Korea National Health and Nutrition Examination Survey; NHIS, National Health Insurance System; HbA1c, glycosylated hemoglobin; BMI, body mass index; CKD, chronic kidney disease. aSystolic blood pressure ≥140 mm Hg and/or diastolic pressure ≥90 mm Hg or on medication, bTotal cholesterol ≥240 mm Hg and/or on medication, cEstimated glomerular filtration rate <60 mL/min/1.73 m2.
Fig. 2.
Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and agreement according to the different algorithms of defining type 2 diabetes mellitus in Korean National Health Insurance System data. Algorithm 1: at least one of the following criteria was met: (1) fasting glucose ≥126 mg/dL or (2) International Classification of Disease, 10th Revision (ICD-10) codes corresponding to type 2 diabetes mellitus (E11-14) with accompanying prescription codes for any anti-diabetic drugs. Algorithm 2: at least one of the following criteria was met: (1) fasting glucose ≥126 mg/dL or (2) ICD-10 codes corresponding to type 2 diabetes mellitus (E11-14).
Table 1.
The prevalence of type 2 diabetes mellitus according to the NHIS and KNHANES diagnostic criteria stratified by the NHIS algorithm used
NHIS criteria KNHANES criteria
No Yes
Algorithm 1
 No 10,683 (82.1) 381 (2.9) 11,064 (85.1)
 Yes 480 (3.7) 1,462 (11.2) 1,942 (14.9)
Algorithm 2
 No 10,025 (77.1) 274 (2.1) 10,299 (79.2)
 Yes 1,138 (8.7) 1,569 (12.1) 2,707 (20.8)
Total 11,163 (85.8) 1,843 (14.2) 13,006 (100)

Values are presented as number (%). Algorithm 1: at least one of the following criteria was met: (1) fasting glucose level ≥126 mg/dL or (2) International Classification of Disease, 10th Revision (ICD-10) codes corresponding to type 2 diabetes mellitus (E11-14) with accompanying prescription codes for any anti-diabetic drugs; Algorithm 2: at least one of the following criteria was met: (1) fasting glucose level ≥126 mg/dL or (2) ICD-10 codes corresponding to type 2 diabetes mellitus (E11-14).

NHIS, National Health Insurance System; KNHANES, Korea National Health and Nutrition Examination Survey.

Table 2.
Characteristics of the study population classified as type 2 diabetes mellitus according to the KNHANES and NHIS criteria (algorithm 1)
NHIS criteria (algorithm 1) KNHANES criteria
P value
Yes No Yes No
Number 1,462 (11.2) 381 (2.9) 480 (3.7) 10,683 (82.1)
Age, yr 63.5±9.5 61.1±10.9 60.8±10.5 56.3±10.9 <0.001
Age ≥65 years 746 (51.0) 152 (39.9) 190 (39.6) 2,718 (25.4) <0.001
Male sex 799 (54.7) 190 (49.9) 174 (36.3) 4,702 (44.0) <0.001
Height, cm 156.6±9.0 156.4±9.4 156.4±9.4 156.8±9.2 <0.001
Weight, kg 60.3±10.7 60.8±11.3 60.8±11.3 57.5±10.7 <0.001
BMI, kg/m2 25.0±3.0 25.3±3.2 25.3±3.2 23.8±2.9 <0.001
Household income <0.001
Quartile 1 (lowest) 425 (29.1) 103 (27.0) 116 (24.2) 1,681 (15.7)
Quartile 2 416 (28.5) 96 (25.2) 130 (27.1) 2,580 (24.2)
Quartile 3 307 (21.0) 106 (27.8) 118 (24.6) 2,994 (28.0)
Quartile 4 (highest) 314 (21.5) 76 (20.0) 116 (24.2) 3,428 (32.1)
Education duration, yr <0.001
<6 602 (41.2) 157 (41.2) 188 (39.2) 2,757 (25.8)
6-9 268 (18.3) 62 (16.3) 72 (15.0) 1,479 (13.8)
10-12 376 (25.7) 97 (25.5) 151 (31.5) 3,514 (32.9)
≥13 216 (14.8) 65 (17.1) 69 (14.4) 2,933 (27.5)
Occupation (yes) 836 (57.2) 229 (60.1) 281 (58.5) 7,415 (69.4) <0.001
Smoking <0.001
Current 290 (19.8) 78 (20.5) 75 (15.6) 1,761 (16.5)
Ex-smoker 425 (29.1) 87 (22.8) 80 (16.7) 2,339 (21.9)
Non-smoker 747 (51.1) 216 (56.7) 325 (67.7) 6,583 (61.6)
Alcohol consumption <0.001
Heavy 138 (9.4) 31 (8.1) 37 (7.7) 755 (7.1)
Mild 797 (54.5) 213 (55.9) 288 (60.0) 6,986 (65.4)
None 527 (36.1) 137 (36.0) 155 (32.3) 2,942 (27.5)
Hypertension (yes) 909 (62.2) 217 (57.0) 203 (42.3) 3,603 (33.7) <0.001
Dyslipidemia (yes) 203 (13.9) 75 (19.7) 72 (15.0) 1,212 (11.4) <0.001
CKD (yes) 120 (8.2) 25 (6.6) 14 (2.9) 238 (2.2) <0.001
Laboratory findings
Fasting glucose, mg/dL 141±40 123±24 100±11 95±9 <0.001
HbA1c, % 8.3±2.1 7.2±1.3 5.9±0.7 5.6±0.6 <0.001
Total cholesterol, mg/dL 181±41 203±40 199±42 195±34 <0.001
HDL-cholesterol, mg/dL 45±11 47±12 49±12 50±12 <0.001
Creatinine, mg/dL 0.88±0.26 0.86±0.22 0.83±0.36 0.83±0.23 <0.001
eGFR, mL/min/1.73 m2 86.8±19.8 87.9±18.5 89.0±16.9 90.4±16.2 <0.001

Values are presented as number (%) or mean±standard deviation.

KNHANES, Korea National Health and Nutrition Examination Survey; NHIS, National Health Insurance System; BMI, body mass index; CKD, chronic kidney disease; HbA1c, glycosylated hemoglobin; HDL, high-density lipoprotein; eGFR, estimated glomerular filtration rate.

Table 3.
Sensitivity, specificity, predictive value, accuracy, and agreement of the operational definition of type 2 diabetes mellitus based on NHIS criteria (algorithm 1) compared to KNHANES criteria as a standard reference
Total T2DM
Sensitivity Specificity PPV NPV Accuracy Kappa
Overall 13,006 1,462 480 10,683 381 0.79 (0.77-0.81) 0.96 (0.95-0.96) 0.75 (0.73-0.77) 0.97 (0.96-0.97) 0.93 (0.93-0.94) 0.73 (0.72-0.75)
Age, yr
40-65 9,200 716 290 7,965 229 0.76 (0.73-0.78) 0.96 (0.96-0.97) 0.71 (0.68-0.74) 0.97 (0.97-0.98) 0.94 (0.94-0.95) 0.70 (0.68-0.73)
≥65 3,806 746 190 2,718 152 0.83 (0.81-0.86) 0.93 (0.93-0.94) 0.80 (0.77-0.82) 0.95 (0.94-0.96) 0.91 (0.90-0.92) 0.75 (0.73-0.78)
Male 5,865 799 174 4,702 190 0.81 (0.78-0.83) 0.96 (0.96-0.97) 0.82 (0.80-0.85) 0.96 (0.96-0.97) 0.94 (0.93-0.94) 0.78 (0.76-0.80)
Female 7,141 663 306 5,981 191 0.78 (0.75-0.80) 0.95 (0.95-0.96) 0.68 (0.65-0.71) 0.97 (0.96-0.97) 0.93 (0.92-0.94) 0.69 (0.66-0.71)
Q1 2,325 425 116 1,681 103 0.80 (0.77-0.84) 0.94 (0.92-0.95) 0.79 (0.75-0.82) 0.94 (0.93-0.95) 0.91 (0.89-0.92) 0.73 (0.70-0.77)
Q2-4 10,681 1,037 364 9,002 278 0.79 (0.77-0.81) 0.96 (0.96-0.97) 0.74 (0.72-0.76) 0.97 (0.97-0.97) 0.94 (0.94-0.94) 0.73 (0.71-0.75)
Education, yr
<9 5,585 870 260 4,236 219 0.80 (0.78-0.82) 0.94 (0.94-0.95) 0.77 (0.75-0.79) 0.95 (0.94-0.96) 0.91 (0.91-0.92) 0.73 (0.71-0.75)
≥9 7,421 592 220 6,447 162 0.79 (0.76-0.81) 0.97 (0.96-0.97) 0.73 (0.70-0.76) 0.98 (0.97-0.98) 0.95 (0.94-0.95) 0.73 (0.70-0.75)
Yes 4,932 909 203 3,603 217 0.81 (0.78-0.83) 0.95 (0.94-0.95) 0.82 (0.79-0.84) 0.94 (0.94-0.95) 0.91 (0.91-0.92) 0.76 (0.74-0.78)
No 8,074 553 277 7,080 164 0.77 (0.74-0.80) 0.96 (0.96-0.97) 0.67 (0.63-0.70) 0.98 (0.97-0.98) 0.95 (0.94-0.95) 0.69 (0.66-0.71)
Yes 1,562 203 72 1,212 75 0.73 (0.68-0.78) 0.94 (0.93-0.96) 0.74 (0.69-0.79) 0.94 (0.93-0.95) 0.91 (0.89-0.92) 0.68 (0.63-0.73)
No 11,444 1,259 408 9,471 306 0.80 (0.78-0.82) 0.96 (0.95-0.96) 0.75 (0.73-0.78) 0.97 (0.97-0.97) 0.94 (0.93-0.94) 0.74 (0.73-0.76)

Values are presented as point estimate (95% confidence interval).

NHIS, National Health Insurance System; KNHANES, Korea National Health and Nutrition Examination Survey; TP, true positive; FP, false positive; TN, true negative; FN, false negative; PPV, positive predictive value; NPV, negative predictive value; Q1, lowest quartile; Q2-4, second to the fourth quartile.


1. Cascini S, Agabiti N, Davoli M, Uccioli L, Meloni M, Giurato L, et al. Survival and factors predicting mortality after major and minor lower-extremity amputations among patients with diabetes: a population-based study using health information systems. BMJ Open Diabetes Res Care 2020;8:e001355.
Article  PubMed  PMC 
2. Choi Y, Choi JW. Association of sleep disturbance with risk of cardiovascular disease and all-cause mortality in patients with new-onset type 2 diabetes: data from the Korean NHISHEALS. Cardiovasc Diabetol 2020;19:61.
Article  PubMed  PMC  PDF 
3. Jung I, Kwon H, Park SE, Han KD, Park YG, Kim YH, et al. Increased risk of cardiovascular disease and mortality in patients with diabetes and coexisting depression: a nationwide population-based cohort study. Diabetes Metab J 2021;45:379-89.
Article  PubMed  PDF 
4. Kim MK, Han K, Lee SH. Current trends of big data research using the Korean National Health Information Database. Diabetes Metab J 2022;46:552-63.
Article  PubMed  PMC  PDF 
5. Ko SH, Han K, Lee YH, Noh J, Park CY, Kim DJ, et al. Past and current status of adult type 2 diabetes mellitus management in Korea: a National Health Insurance Service Database Analysis. Diabetes Metab J 2018;42:93-100.
Article  PubMed  PMC  PDF 
6. Jagannathan R, Neves JS, Dorcely B, Chung ST, Tamura K, Rhee M, et al. The oral glucose tolerance test: 100 years later. Diabetes Metab Syndr Obes 2020;13:3787-805.
PubMed  PMC 
7. Lee YH, Han K, Ko SH, Ko KS, Lee KU; Taskforce Team of Diabetes Fact Sheet of the Korean Diabetes Association. Data analytic process of a nationwide population-based study using National Health Information Database established by National Health Insurance Service. Diabetes Metab J 2016;40:79-82.
Article  PubMed  PMC  PDF 
8. Jo SH, Nam H, Lee J, Park S, Lee J, Kyoung DS. Fenofibrate use is associated with lower mortality and fewer cardiovascular events in patients with diabetes: results of 10,114 patients from the Korean National Health Insurance Service Cohort. Diabetes Care 2021;44:1868-76.
Article  PubMed  PDF 
9. Jeong JS, Kim JS, Yeom SW, Lee MG, You YS, Lee YC. Prevalence and comorbidities of bronchiolitis in adults: a population-based study in South Korea. Medicine (Baltimore) 2022;101:e29551.
PubMed  PMC 
10. Kim J, Yang PS, Park BE, Kang TS, Lim SH, Cho S, et al. Association of proteinuria and incident atrial fibrillation in patients with diabetes mellitus: a population-based senior cohort study. Sci Rep 2021;11:17013.
Article  PubMed  PMC  PDF 
11. Hong JS, Kang HC. Body mass index and all-cause mortality in patients with newly diagnosed type 2 diabetes mellitus in South Korea: a retrospective cohort study. BMJ Open 2022;12:e048784.
Article  PubMed  PMC 
12. Kim JE, Choi J, Park J, Shin A, Choi NK, Choi JY. Effects of menopausal hormone therapy on cardiovascular diseases and type 2 diabetes in middle-aged postmenopausal women: analysis of the Korea National Health Insurance Service Database. Menopause 2021;28:1225-32.
Article  PubMed 
13. Lee CJ, Hwang J, Lee YH, Oh J, Lee SH, Kang SM, et al. Blood pressure level associated with lowest cardiovascular event in hypertensive diabetic patients. J Hypertens 2018;36:2434-43.
Article  PubMed 
14. Lee SE, Kim KA, Son KJ, Song SO, Park KH, Park SH, et al. Trends and risk factors in severe hypoglycemia among individuals with type 2 diabetes in Korea. Diabetes Res Clin Pract 2021;178:108946.
Article  PubMed 
15. Bae JH, Han KD, Ko SH, Yang YS, Choi JH, Choi KM, et al. Diabetes fact sheet in Korea 2021. Diabetes Metab J 2022;46:417-26.
Article  PubMed  PMC  PDF 
16. Kim J, Bae YJ, Lee JW, Kim YS, Kim Y, You HS, et al. Metformin use in cancer survivors with diabetes reduces all-cause mortality, based on the Korean National Health Insurance Service between 2002 and 2015. Medicine (Baltimore) 2021;100:e25045.
Article  PubMed  PMC 
17. Park JH, Ha KH, Kim BY, Lee JH, Kim DJ. Trends in cardiovascular complications and mortality among patients with diabetes in South Korea. Diabetes Metab J 2021;45:120-4.
Article  PubMed  PDF 
18. Khokhar B, Jette N, Metcalfe A, Cunningham CT, Quan H, Kaplan GG, et al. Systematic review of validated case definitions for diabetes in ICD-9-coded and ICD-10-coded data in adult populations. BMJ Open 2016;6:e009952.
Article  PubMed  PMC 
19. Kweon S, Kim Y, Jang MJ, Kim Y, Kim K, Choi S, et al. Data resource profile: the Korea National Health and Nutrition Examination Survey (KNHANES). Int J Epidemiol 2014;43:69-77.
Article  PubMed  PMC 
20. Lee HY, Shin J, Kim GH, Park S, Ihm SH, Kim HC, et al. 2018 Korean Society of Hypertension guidelines for the management of hypertension: part II-diagnosis and treatment of hypertension. Clin Hypertens 2019;25:20.
Article  PubMed  PMC  PDF 
21. Rhee EJ, Kim HC, Kim JH, Lee EY, Kim BJ, Kim EM, et al. 2018 Guidelines for the management of dyslipidemia. Korean J Intern Med 2019;34:723-71.
Article  PubMed  PMC  PDF 
22. Levin A, Stevens PE. Summary of KDIGO 2012 CKD guideline: behind the scenes, need for guidance, and a framework for moving forward. Kidney Int 2014;85:49-61.
Article  PubMed 
23. Dufour MC. What is moderate drinking?: defining “drinks” and drinking levels. Alcohol Res Health 1999;23:5-14.
PubMed  PMC 
24. Rosenberg DE, Bull FC, Marshall AL, Sallis JF, Bauman AE. Assessment of sedentary behavior with the International Physical Activity Questionnaire. J Phys Act Health 2008;5 Suppl 1:S30-44.
Article  PubMed 
25. Simundic AM. Measures of diagnostic accuracy: basic definitions. EJIFCC 2009;19:203-11.
PubMed  PMC 
26. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74.
Article  PubMed 
27. Chiu CJ, Huang HM, Lu TH, Wang YW. National health data linkage and the agreement between self-reports and medical records for middle-aged and older adults in Taiwan. BMC Health Serv Res 2018;18:917.
Article  PubMed  PMC  PDF 
28. Nishioka Y, Takeshita S, Kubo S, Myojin T, Noda T, Okada S, et al. Appropriate definition of diabetes using an administrative database: a cross-sectional cohort validation study. J Diabetes Investig 2022;13:249-55.
Article  PubMed  PDF 
29. Lipscombe LL, Hwee J, Webster L, Shah BR, Booth GL, Tu K. Identifying diabetes cases from administrative data: a population-based validation study. BMC Health Serv Res 2018;18:316.
Article  PubMed  PMC  PDF 

Editorial Office
101-2104, Lotte Castle President, 109 Mapo-daero, Mapo-gu, Seoul 04146, Korea​
Tel: +82-2-714-9064    Fax: +82-2-714-9084    E-mail: diabetes@kams.or.kr                

Copyright © 2023 by Korean Diabetes Association.

Developed in M2PI

Close layer