Data Analytic Process of a Nationwide Population-Based Study Using National Health Information Database Established by National Health Insurance Service
Article information
Abstract
In 2014, the National Health Insurance Service (NHIS) signed a memorandum of understanding with the Korean Diabetes Association to provide limited open access to its databases for investigating the past and current status of diabetes and its management. NHIS databases include the entire Korean population; therefore, it can be used as a population-based nationwide study for various diseases, including diabetes and its complications. This report presents how we established the analytic system of nation-wide population-based studies using the NHIS database as follows: the selection of database study population and its distribution and operational definition of diabetes and patients of currently ongoing collaboration projects.
OVERVIEW: NATIONAL HEALTH INFORMATION DATABASE ESTABLISHED BY NATIONAL HEALTH INSURANCE SERVICE
In 2014, the National Health Insurance Service (NHIS) signed a memorandum of understanding with the Korean Diabetes Association (KDA) to provide limited open access to its databases for investigating the past and current status of diabetes and its management. A previous review by Song et al. [1] described in detail regarding the history, structure, contents, and way to use data procurement in the Korean National Health Insurance (NHI) system. Briefly, the NHIS in Korea is a single-payer organization that is mandatory for all residents in Korea. Because it has adopted a fee-for-service system to pay health care providers who treat or examine Korean patients, NHIS obtains information on patient demographics, medical use/transaction information, insurers' payment coverage, and patients' deduction and claim database (diagnosis/prescriptions/consultation statements). The NHIS database represents the entire Korean population; therefore, it can be used as a population-based, nationwide study for various diseases. Recently, several epidemiologic studies with a large population using the NHIS database have been reported [2].
DATABASE POPULATION
A single insurer, NHI system consists of two major health care programs for universal coverage of all residents of Korea: NHI and Medical Aid (MA). Approximately 97% of the population is covered by NHI, and the remaining 3% of the population is covered by MA. Since 2006, information of MA beneficiaries has been incorporated into a single NHIS database. Therefore, the NHIS database during 2002 to 2005 included only information of NHI beneficiaries, but not MA beneficiaries, which should be considered in caution when interpreting the findings in this period. Retrospective data of individuals aged more than 30 years were extracted using the Korean NHIS database from January 2002 through December 2013.
DATABASE CONTENTS
Among sub-datasets of the NHIS database, we used Qualification DB, Claim DB, Health Check-up DB, and death information.
Qualification DB includes sex, age, income, region, and types of qualification. Using this database, we showed the distribution of study participants aged more than 30 years from the NHIS database from 2002 to 2013 by gender and age (Table 1).
Claim DB includes general information on specification (20T), consultation statement (30T), diagnosis statements defined by the International Classification of Diseases 10th revision (ICD-10; 40T), and detailed statements about prescriptions (60T). Detailed information is provided by a previous review paper [1].
Health Check-up DB generally consists of four areas: general health check-up, lifetime transition period health check-up, cancer check-up, and baby/infant health check-up [3]. Among them, we used the database from the general health check-up, which includes (1) employee subscribers and regional insurance subscribers who are a regional householder, (2) employee subscribers' dependent and household members (40 years or older), and (3) MA beneficiaries who are a householder of 19 to 64 years of age and household members of 41 to 64 years. All examinees were requested to have biannual health check-ups, except non-office workers who are employee subscribers (annual). The proportion of complete health check-ups was approximately 40% in 2002, whereas it increased up to 68% in 2013.
DEFINITION OF DIABETES
Considering the characteristics of the NHIS database, an operational definition of diabetes was applied for further analysis. For Claim DB, individuals having diabetes were defined if anti-diabetic drugs were prescribed with the presence of ICD-10 codes E11, E12, E13, or E14, as either principal diagnosis or 1st to 4th additional diagnosis at least once a year. For the Health Check-up DB, patients with fasting glucose levels ≥126 mg/dL were considered as having diabetes. This operational definition of diabetes was concluded by the following data analysis using study participants in the 2013 NHIS database.
Table 2 shows numbers of NHI beneficiaries aged ≥30 years and General Health Check-up examinees in 2013 by age. If the prevalence of diabetes is calculated among NHI beneficiaries aged ≥30 years based on either prescription of anti-diabetic drugs (insulins, sulfonylureas, metformin, meglitinides, thiazolidinediones, dipeptidyl peptidase-4 inhibitors, and α-glucosidase inhibitors), ICD-10 codes, or both, there are substantial discrepancies by the different categories. The proportion of patients with diabetes was 8.50% (3.95%+4.55%) defined by combination of diagnosis and prescription data, whereas it was 13.15% (4.14%+0.51%+3.95%+4.55%) by ICD-10 codes alone and 8.69% (0.12%+0.07%+3.95%+4.55%) based on prescription data alone (Table 3). The Taskforce Team concluded the operational definition of diabetes as either (1) patients who had both data of diagnosis and prescription of anti-diabetic drugs or (2) patients whose fasting glucose levels from Health Check-up DB are more than 126 mg/dL. According to this definition, the prevalence of diabetes was 11.40% (2.32%+0.07%+0.51%+3.95%+4.55%).
ANALYTIC METHODS IN 16 COLLABORATION PROJECTS
Since 2014 when the KDA and NHIS signed an agreement for open access to the NHIS database, 16 subjects of collaboration projects are currently ongoing. Depending on the project objectives, different operational definitions have been applied to diagnose diseases. Overall projects can be divided into two categories: analyses of disease status and analyses of casual relationships among diseases, management, and drugs. Analyses of disease status demonstrated the annual prevalence and age-standardized prevalence of specific diseases, such as diabetic nephropathy. Analyses of casual relationships (e.g., effects of anti-diabetic drugs on cancer, association between diabetes, and percutaneous coronary intervention) applied the study design with washout periods and Cox-hazard regression models.
VALUE AND CHALLENGES OF NHIS DATABASE
The NHIS database represents the entire Korean population; therefore, it can be used as a population-based nationwide study for various diseases. Because it contains detailed information regarding statement of prescriptions and medical examination or treatments, such as medical care and in-hospital administration of medicine, procedures, and surgery, investigation of the trends or status of specific diseases is feasible. Furthermore, long-term follow-up of a single individual can allow us to perform longitudinal studies of casual relationships. By combining laboratory and standard questionnaire information from the Health Check-up DB, limitation of Claim DB (without having any laboratory or personal history data) can be overcome.
Despite its strengths, one of the most critical drawbacks is the discrepancy between diagnosis of individuals in real practice and that recorded in Claim DB. Generally, proportion of discrepancy in diagnosis might be more prominent in claim data from outpatient clinics, less-severe illnesses, and primary care clinics, compared with inpatient hospitalization, severe illnesses, and tertiary or general hospitals, respectively. Therefore, appropriate operational definitions should be required to minimize the inconsistent and inaccurate results. Moreover, because the NHIS covers only insured benefits, uninsured payments could not be estimated from this database. Because information of MA beneficiaries was incorporated into a single NHIS database from 2006, NHIS database during 2002 to 2005 included only information of NHI beneficiaries but not MA beneficiaries, which should be considered with caution when interpreting the findings in this period. In terms of Health Check-up DB, the proportion of complete health check-up examinees was only 40% in 2002, and different intervals of health check-up among beneficiaries should be considered in the study design.
Notes
CONFLICTS OF INTEREST: No potential conflict of interest relevant to this article was reported.