When a Country Cannot be a Cohort: Challenges of Implementing a Large Precision Medicine Cohort Study in the United States

March 23, 2015 by Muin J Khoury, Director, Office of Public Health Genomics, Centers for Disease Control and Prevention

The recently proposed US precision medicine initiative promises a new era of healthcare with targeted disease treatment and prevention. It prominently features a longitudinal study of a national cohort of a million or more people to customize interventions based on a person’s genetics and other factors. The long term goal of this study is to “generate the knowledge base necessary to move precision medicine into virtually all areas of health and disease”. Other countries have already launched large national cohorts and biobanks to examine genetic and environmental factors in relation to human diseases. Conducting a cohort study in the US is particularly challenging from a number of vantage points. To explore these challenges, let us briefly consider what has been done in Denmark where “the entire country is a cohort”.

Denmark has gathered more data on its citizens than any other country. The Danish Civil Registration System (CRS) contains individual-level information on all residents of Denmark (and Greenland as of 1972). By January 2014, the CRS had registered 9.5 million individuals and more than 400 million person-years of follow-up. A unique ten-digit Civil Personal Register number assigned to all persons in the CRS allows individual-level record linkage of all Danish registers. Daily updated information on migration and vital status allows for nationwide cohort studies with virtually complete long-term follow-up on emigration and death. The CRS facilitates sampling of general population comparison cohorts, controls in case–control studies, family studies and targeted population surveys. The data in the CRS are virtually complete, have high accuracy, and can be retrieved for research purposes while protecting the anonymity of Danish residents. Although other Scandinavian countries have their own databases, Denmark has the reputation for possessing the most complete collection of statistics and databases touching on almost every aspect of life. The Danish government has compiled nearly 200 databases, some begun in the 1930s, on everything from medical records to socioeconomic data on jobs and salaries. These databases “allow for instant, large cohort studies that are impossible in most countries.” Examples of genetic studies using this unique resource include studies of genes and lifestyle in aging using the Danish Twin Register which includes 110,000 pairs of twins. Another example is a recent series of genomewide association studies that have identified genetic factors associated with febrile seizures in children after receiving the measles, mumps, and rubella (MMR) vaccine.

In the United States, a recent NIH workshop explored issues and challenges for creating a national cohort of at least one million Americans to advance our understanding of heath and disease. One outcome of the NIH workshop was the identification of barriers in establishing a large cohort:

Resources needed for implementing such a large scale project
Time required to obtain meaningful results – Longitudinal studies of chronic disease outcomes span decades to allow for a robust number of endpoints to occur.
Contact – Existing cohorts are heterogeneous with respect to permission for data sharing and the need for researchers to re-contact/consent participants.
Demographics – Existing US cohorts do not completely represent the American population or projected demographic changes.
Privacy – There are concerns about privacy, security and access to individual data and health records.
Dynamic Technologies – Administrative-claims, digital and smart-phone technologies to track participants over time and space are rapidly evolving
Scope – Sufficient sample size required to capture small proportion of people with a specific disease or genotype.
Coordination, transparency, and governance – Necessary information is not readily available including fragmentation of electronic health records and claims data, data platforms, and health care systems.

The proposed solution to create a national cohort is to build upon a platform of existing cohorts. By assembling existing cohorts into a large consortium of cohorts, with a central infrastructure, NIH could harmonize data types; enhance data collection; achieve economies of scale; and provide a resource for addressing new scientific questions.

One model of a US cohort consortium has been successfully implemented for years at the National Cancer Institute (NCI). The NCI Cohort Consortium seeks to address the need for large-scale collaborations to pool the large quantity of data and biospecimens necessary to conduct a wide range of studies. The Cohort Consortium includes investigators responsible for more than 40 high-quality cohorts involving more than 4 million people. The cohorts cover large, rich, and diverse populations. Extensive risk factor data are available, and biospecimens including germline DNA collected at baseline, are available on more than 2 million individuals. Investigators team up to use common protocols and methods, and to conduct coordinated and pooled analyses.

Obviously, the conversation has just started among funding agencies, scientists, patients and other stakeholders on the most optimal design of a US national consortium of cohorts. While the US cannot be a “cohort”, the long term benefit of assembling a nationally representative cohort of the population is definitely worth a try.

March 23, 2015 by Muin J Khoury, Director, Office of Public Health Genomics, Centers for Disease Control and Prevention