100,000 Studies: A Milestone for Human Genome Epidemiology (HuGE) and the HuGE Navigator

Posted on by Marta Gwinn, Consultant, McKing Consulting Corp, Office of Public Health Genomics, Centers for Disease Control and Prevention

a HUGE odometer with 100000 on it

The HuGE published literature database now contains more than 100,000 citations, a milestone reached at the end of 2014. The Office of Public Health Genomics has compiled this database since 2001 via weekly systematic sweeps of PubMed performed by a single curator. For the first five years, a complex PubMed query was used to identify studies of genotype prevalence, gene-disease association, gene-environment interaction, and the performance characteristics of genetic tests. In 2006, a data mining approach using support vector machines replaced the PubMed query, reducing the time needed for hand curation and improving both sensitivity and specificity. The database and a suite of online tools to explore it were re-launched as the HuGE Navigator.

Since the first draft of the human genome sequence was announced in 2001, PubMed has added more than one million articles on human genetics and genomics. Human genome epidemiology has grown, too, but studies of genetic variation and disease in populations—i.e., groups of people not defined by family relationships—still accounts for only a small fraction of the total (Figure 1).

Articles in HuGE published literature database
Articles in HuGE published literature database,
by year of publication – 2001-2014*

A boom in gene discovery followed the introduction of genome-wide association studies (GWAS) (hotlink) in 2005; following up on these discoveries to unravel genetic contributions to disease, however, remains extremely challenging. There are no “high-throughput” shortcuts to understanding. Now that it seems clear that common genetic variants have only small effects on disease risk, the field has shifted toward studies of rare variants with large effects. This may look like a return to the pre-Human Genome Project roots of genetic epidemiology; discoveries in this phase, however, are just the next steps toward building the knowledge base for population-level interpretation.

Meta-analysis has become popular as a first step in knowledge synthesis. Concern over the proliferation of poorly conducted meta-analyses, however, led the editors of PLOS ONE to establish explicit quality criteria for submitted manuscripts and the American Journal of Epidemiology has endorsed this approach. Although rigorous meta-analysis can be useful for assessing and refining gene discoveries, it does not suggest next steps. Other methods are needed to integrate genetic data into ways of thinking that can help us understand, prevent and treat disease. Human genome epidemiology must evolve to help meet this challenge.

Countries with authors of articles in HuGE
Countries with authors of articles in HuGE
published literature database – 2001-2014*

On Jan 5, 2015, the HuGE Navigator completed transition to a completely automated curation process based on machine learning and data extraction. This method has achieved 90% sensitivity and specificity when tested against the previous, semi-automated process. The HuGE published literature database will continue to be updated weekly with automatic indexing of gene symbols, study type (meta-analysis, GWAS), and category (pharmacogenomics, genetic testing).

Human genome epidemiology is a global enterprise. The first 100,000 articles in the database included authors from 151 countries (Fig 2). The HuGE Navigator will remain online as a freely accessible resource for all who are interested in human genetic variation and population health.


Posted on by Marta Gwinn, Consultant, McKing Consulting Corp, Office of Public Health Genomics, Centers for Disease Control and PreventionTags , , ,
Page last reviewed: April 8, 2024
Page last updated: April 8, 2024