Large-Scale Population Studies as a Path to Personalized Medicine: Easier Said than Done!

Posted on by Muin J. Khoury and Marta Gwinn, Office of Genomics and Precision Public Health, Centers for Disease Control and Prevention

two puzzle pieces coming together made out of a population with a magnifying glass on them and DNA in the backgroundFor more than two decades, advances in genomics have promised a new era of personalized or precision medicine (i.e., the right intervention to the right person at the right time). Scientific evaluation of new gene discoveries has been aided by the launch of large-scale epidemiologic and clinical collaborative global studies. In a recent commentary, McCarthy and Birney argue that these studies can provide an evidentiary path to personalized medicine – but only if they are done in diverse populations, and integrate rare and common genetic risk factors with measurements of changing environments and health indices over time. Below, we summarize their insightful recommendations and offer our public health perspective.

1-Integrating rare and common genetic risk factors into a genetic risk score

The NIH Genetic Testing Registry (GTR), which collects information on tests in clinical use, currently includes tests for more than 18,000 disorders; these include relatively rare heritable diseases like cystic fibrosis and sickle cell disease, along with many others. Genetic risk factors for common diseases include thousands of variants identified by genome-wide association studies; however, their individual effects on risk are too small to have much predictive value, leading some researchers to propose polygenic risk scores. Most data used to construct polygenic risk scores come from people of European descent. These scores may perform poorly when used in other populations, especially in racial and ethnic minorities, where their use could produce erroneous risk estimates and exacerbate health disparities. Studies that incorporate sequencing are also revealing the role of rare variants with substantial effects in carriers (e.g., BRCA1 and BRCA2 in breast and ovarian cancer). Risk scores that integrate information on common and rare variants will produce more precise individual estimates of genetic risk than those based on either alone.

2-Integrating genetic with environmental risk factors

When McCarthy and Birney state, “there is more to disease risk than genetics,” we suspect they are speaking to geneticists. They could just as well suggest that epidemiologists look beyond environmental and socioeconomic factors or remind clinicians to look beyond routine laboratory values. Genetic and non-genetic risk factors interact in ways that can often be hard to measure, much less integrate in a way that informs risk estimates. Even for well characterized factors, such as smoking and exercise, the lifetime impact on disease is difficult to construct using cross sectional measurements (e.g., number of steps walked last week). In addition, social determinants of health, such as access to health care, education, and housing, have major impacts on disease risk and outcomes. Just as with genetic risk, epidemiologic models based on narrowly selected populations (such as well-off, well-educated volunteers) may not translate accurately to other communities.

3-Integrating clinical information over time

Gathering clinical measurements over time can improve disease risk estimation. Snapshots of clinical and laboratory measurement may have limited predictive ability, whereas the availability of longitudinal data (e.g., electronic health records, wearable devices) could improve prediction. For example, a recent study modeled 2.1 billion measurements from 92 different laboratory tests in 2.8 million adults over a span of 18 years. For 131 chronic conditions and 5,223 drug–test pairs, they assessed tests distributions in healthy individuals. Age and sex alone explain less than 10% of the test variance in 89 out of 92 tests. However, personalized models based on patients’ history explain 60% of the variance for 17 tests and over 36% of the variance for half of the tests. This proof-of-concept study points to the potential utility of systematic stratification of risk for subsequent disease development.

4-Embracing complexity and avoiding artificial categorization of risk factors

In medicine and public health, we often categorize diseases and risk factors into arbitrary groups. Most diseases represent a confluence of pathological processes, even in the same person. For example, heart disease may be due to a combination of processes such as glucose metabolism, lipid disorders, high blood pressure and inflammation. The precise combination will be different from one person to another, and across the lifetime of each person. McCarthy and Birney recommend that when several processes contribute to disease, it makes sense to track each process over time, rather than collapse information into arbitrary, disease, or risk categories.

Going Forward

The authors argue that, “efforts to base personalized medicine on risk-factor prediction alone will fall short.” They recommend a more “holistic” approach to embrace greater population diversity in the design and implementation of epidemiologic studies, focusing not only on gender and ethnicity, but also on social, cultural, and economic factors that influence disease risk and access to health care. A prominent example of this approach is the All of Us Research Program, a major NIH-led effort to recruit a diverse cohort of one million persons and follow them longitudinally for health outcome data. The study will also collect genetic and environmental factors enabling the creation of large, rich data sets that capture individual health trajectories.

It is perhaps not surprising that the path to personalized medicine should start with large scale longitudinal population studies. Nevertheless, these studies will take years if not decades to complete, and they are very expensive and complex. Furthermore, they can have methodological pitfalls in selection of participants, measurement of risk factors and outcomes, data standardization and sharing, complex analyses and causal inference. In the meantime, adding implementation science to these studies can inform how best to deploy interventions that we already know can save lives and reduce health disparities.


Posted on by Muin J. Khoury and Marta Gwinn, Office of Genomics and Precision Public Health, Centers for Disease Control and PreventionTags
Page last reviewed: April 9, 2024
Page last updated: April 9, 2024