Biomarker Score in Risk Prediction: Beyond Scientific Evidence and Statistical Performance
Article information
Common objectives in research are to (1) explain and (2) predict [12]. In practice, these are generally inter-dependent; for example, sound prediction relies upon clear explanation (e.g., causal mechanism), and comprehensible explanation is informed by predictors such as correlation and association.
From clinical and public health perspectives, type 2 diabetes mellitus has emerged as a critical concern in all regions of the world, including Asia. Current theory is that diabetes in Asian populations presents somewhat differently compared with Western populations, e.g., “Asian body mass index (BMI)” [34].
Notably, multiple studies, including randomized controlled trials, have demonstrated that it may be possible to prevent or delay diabetes onset. While prevention may be considered the highest goal, it could be costly and time-consuming at the population level; benefits are in the future and costs are immediate [5]. Diabetes is often accompanied by other conditions or adverse events that can be used for purposes of early prediction and treatment. Since diabetes or prediabetes may be present but undiagnosed, research methods to identify or screen ‘current’ at-risk individuals are as needed as those aimed at predicting and preventing ‘future’ cases. In fact, the question of how to identify those at high risk for diabetes has been studied for a few decades, resulting in publication of a number of prediction models and risk or screening scores. These tools may be tailored to accommodate specific populations or subgroups; for example, American Diabetes Association diabetes risk test, Centers for Disease Control and Prevention prediabetes screening test, Korean diabetes risk score, among others [6-8]. However, given the volume of models/scores and wide range of predictors, clinicians and patients may find it difficult to select the best tool or they ignore them altogether, citing “research for research sake,” perceived lack of cooperation among investigators and with users, or frequently changing and often inconsistent results as reasons [9101112]. For example, more than 350 models for chronic obstructive pulmonary disease and for cardiovascular disease have been reported [1112].
In this issue of the journal, Wang et al. [13] report on a proposed biomarker risk score to predict diabetes using the Singapore Chinese Health Study, a population-based cohort study, with a nested matched case-control design, using standard risk factors and (scientifically supported) biomarkers as predictors. The study employed conditional logistic regression for data analysis, area under the curve (AUC) P value and other information and re-classification statistics as evaluation measures, and internal validation and sensitivity analyses; these approaches are well accepted standards in research practice, despite some controversies in some measures [14]. No external validation, data limitations (e.g., no waist), uncertain cost implications, and future research direction (e.g., determination of cut-off values) were correctly discussed as potential limitations, indicating the authors' attempts at a balanced report. Here, I share several observations and viewpoints for readers pursuing similar lines of research.
First, the authors rightly mentioned cost and cost-effectiveness issues. Decisions involve trade-offs. While development of a biomarker test may create jobs, identify patients at risk and improve the public health, it will also incur costs (and possible other side effects). Despite the potential value of a statistically significant observation, cost might outweigh benefit; this will be particularly relevant in resource-limited settings. Despite expensive cutting-edge, high-tech or high-dimensional methods or best-fitting models, clinical or practical improvements may be disappointing [151617] or even found to be no better than well known, easily obtained information at low or no cost, such as family history, BMI, or age. In that case, we may become “cost boosters” despite good intentions and the scientific knowledge gained.
Second, beyond prediction, what is the next step? After obtaining a set of predictors, odds/risk ratio (beta coefficient), risk estimate, or possibly risk status (say, high, or low), “so what?” A clinician (or patient) might use the information to recommend medication or additional tests, modified diet, increased exercise, or for emotional or financial preparedness. This point may best explain why so many prediction models are not used in the real world, in that there is no clear next step. Moreover, people tend not to worry much (or over-worry) about future events, and the importance of probability is perceived differently among individuals; for example, whether a 15% risk over 5 to 10 years is low or high [1518].
Third, there are many biomarkers, and they are generally correlated, and levels of a variety of biomarkers would increase prior to the manifestation of glucose intolerance. It is entirely possible that different patient populations will be characterized by dissimilar sets of biomarkers, say, different “final four.” Understanding and use of these biomarkers or anything novel requires expert clinical interpretation (vs. patient's self-assessment or learning). In implementation, the usual issues related to measurement and variable definitions come into play: different units (e.g., mg/dL vs. mg/L vs. mmol/L); continuous vs. categorical variables (e.g., yes/no or high/middle/low); not applicable or missing/censored/mismeasured (randomly or informatively) data. In addition, some biomarkers (such as ferritin) may be unknown to most non-experts, so that even a single variable may be a barrier or deal breaker in real world adoption of a model. Model developers and clinicians should ask if biomarker A is really better than biomarker B (e.g., glycosylated hemoglobin), considering the pros and cons of each. Models can compete, evolve, or be upgraded naturally as with scientific knowledge, but there may be unintended consequences related to confusion and mistrust. Nevertheless, these could be opportunities for patient-clinician communication toward shared decision making and patient empowerment. Some clinicians view diabetes as a metabolic disease versus glucose disease and may treat biomarkers accordingly without need of a score. Clinicians may be called upon to explain biomarkers to patients—what they are, what their levels signify, why a given medication is needed, or even why disease definition can differ by sex/race/age/location. A patient's health education, including disease awareness and behavioral changes, may actually be stymied by highly technical and complex information. Also, which variables to use and how to use them in a risk model/score is an art, beyond science. While traditional regression may be well suited to education and simple and explainable models, black-box or push-the-button approaches (e.g., machine learning, artificial intelligence) can certainly have advantages in prediction or diagnostics. They should not compete [2181920].
Fourth, the terms “predictable,” “preventable,” “modifiable,” and “actionable” are not synonymous. While most biomarkers are modifiable or may serve as a mediator or surrogate marker, medical history and socioeconomic status are essentially unmodifiable despite high predictive value. As Hume (1748) stated, “The only immediate utility of all the sciences is to teach us how to control and regulate future events through their causes.” Although journal editors and reviewers of prediction models tend to be too focused on AUC, too high AUC may imply “too late,” such that predictor and outcome are virtually the same thing, or that a predictor must be just another (correlated) outcome or early marker of disease onset. Context and purpose always matter, beyond statistical performance (e.g., higher AUC, larger odds ratio, lower P value, largest sample size) in the prediction race [21819].
Finally, the number #1 reason for any invention is need. Why need, for whom, when, etc. My personal motto for prediction modeling or method development as a statistician with heath economics/services and epidemiology training is: “As long as some use or are willing to pay, a model can be a success.” Researchers developing future models should emphasize this ‘translational’ aspect and practical use; perhaps a reasonable short-term goal could be intent to implement at the authors' own institution(s). The study of risk prediction is widespread in medicine and public health, with ever increasing availability of data and powerful computer and easy statistical modeling [11], but it should encompass fundamental principles of business/marketing and engineering. Without a consumer or user, a risk score is just another regression model, supplier of a bag of odds/risk/hazard ratios, or one more paper in curriculum vitae (including my own!).
Regarding prediction modeling, Breiman stated: “My attitude toward new and/or complicated methods is pragmatic. Prove that you've got a better mousetrap and I'll buy it. But the proof had better be concrete and convincing” [220]. My hope is that we can maintain a disciplined, focused but pragmatic approach to conducting studies that will impart real knowledge, impact, use or value to diabetes research worldwide.
ACKNOWLEDGMENTS
The author thanks Ms. Caron Modeas for English editing service, and Drs. Thamer, Spence, Kaufman, Franks, Jaffe, and Kashner for useful advice and comments. The author is partly supported by the National Institutes of Health through grant UL1 TR001860 and R01 AR076088.
Notes
CONFLICTS OF INTEREST: No potential conflict of interest relevant to this article was reported.