Skip directly to search Skip directly to A to Z list Skip directly to site content Skip directly to page options
CDC Home

Linking up with the NCHS Research Data Center

A collective blog to foster communication among RDC


Choosing Your Mode of Access

Choosing your access mode may seem as simple as choosing the most convenient option, however there are several  factors you should consider before choosing your mode of access . 

First, some datasets from NCHS or hosted by the RDC are not available on every access mode.  For instance, the genetic component of the National Health and Nutrition Examination Survey (NHANES ) is not available in Census RDCs.  Similarly, Linked CMS data products are not available on the remote access system.  Until recently, we only provided researchers with this information after a proposal has been submitted. We now realize that your choice of data or even use of the RDC relies heavily on your chosen access mode. In an effort to help you better we developed a Data Availability chart (see Access Modes), which highlights the NCHS surveys available on each access mode.  

Next you have to consider your analytic software; SAS is the only analytic software available for all four access modes.  While both the NCHS and Census RDCS offer the most commonly used analytic software, such as STATA and R, these options are not available on the remote access system.

After you’ve made a choice on all these factors you will have to consider your timeframe. Remote access is available 24 hours a day, 7 days a week.  In most cases, both NCHS and Census RDCs will require some travel. Staff assisted is always limited to the availability of the staff and conducting analysis does not take priority over reviewing proposals. 

Finally, you should consider cost. Depending on the amount of time and funds available to you, one access mode may be more cost effective over another. 

Despite the above considerations, it is important that you still maintain a bit of flexibility in your access mode. There are times where your request will require the use of a specific access mode. Like in most things in life, you have to weigh the advantages and disadvantages of each access mode in order to make an informed decision.

Helpful links:

Access Modes

Census RDCs

User Fees

RDC Roll Call

Did you know the RDC does not share unpublished research with other researchers? Similarly, we don’t share the names and expertise of our RDC users. However, we realize that this may make it difficult for RDC users to find and collaborate with each other.  We would like to use this month’s blog post to help you reach out to other researchers. If you are interested in collaborating with other RDC researchers, please post your name, area of expertise/research interests, and contact email. At the end of the month, we will compile a list and email it to anyone who has responded to this post. This list will also be shared with others who may be interested in future collaborations, but only if the collaborator is willing to join the list as well.

Please note this list should not be used for the intentions of trying to identify research in progress. Nor should it be used for anything outside of establishing a research collaboration.

What was published using the RDC in 2012?

Every quarter, our staff posts any new publications from researchers who have utilized data through the National Center for Health Statistics Research Data Center (RDC) to our “What’s New?” and RDC User Publications webpages. These articles were either sent to us by the researchers or were found through searches on NCBI PubMed and Google Scholar. We thought that it may be interesting and instructive to construct a word cloud of the titles and abstracts from these articles to view major themes.

The size of the word in the wordcloud corresponds to its frequency in the titles and abstracts of RDC User articles that were published in 2012. Created using the wordcloud R package:

Perhaps not unexpectedly, the word that is largest in the word cloud, corresponding to the most frequent in the titles and abstracts is “health”. Words describing the surveys (e.g., “nutrition” and “examination”), study populations (e.g., “children” and “adults”), and the research topics (e.g., “cancer” and “insurance”) can also be quickly identified. Looking closer, we see words describing the restricted data that were accessed through the RDC. For example, the words “mortality” and “death” may be associated with restricted linked mortality files while “urban” and “rural” may be associated with the restricted geocode data.

Looking ahead, it will be interesting to see what publications are on the horizon in 2013 and what the wordcloud for this year’s publications might look like in comparison.

Congratulations to all our researchers that published on their RDC projects in 2012. To researchers that are actively working on projects, we look forward to continuing to work with you in 2013 and seeing what publications result from your time at the RDC. And, to researchers interested in using the RDC, we hope to hear from you soon.

RDC Best Practices: Original versus Derived Variables

As an RDC analyst, I would like to share with you some advice I give to all researchers whose proposals are assigned to me.

As you know, it is a researchers’ responsibility to extract and send public-use NCHS and non-NCHS data to the RDC to be merged with restricted variables by their analyst. It is recommended that you familiarize yourselves with NCHS data by doing preliminary analysis with the public use data. Often you may need to rename, recode or re-categorize variables for your analysis. It may seem like a good idea to send us a public use dataset with derived variables instead of the original ones. I strongly urge you not to do this.

While it is not against RDC rules to send us recodes instead of original variables, doing so may lead to extra work for your RDC analyst as well as delays and extra charges. There are two examples that come to mind.  On both proposals, researchers sent in derived variables instead of the original ones. Researchers working on the first proposal made a mistake while creating derived variables. With regard to the second proposal, the Student advisor changed her mind about the grouping of analytic variables and the researchers needed to categorize them in a different way. Since the original variables were not included in the data sets the researchers sent to the RDC, the researchers had to resend the public use datasets with original variables and I had to redo the merge. This resulted in delays for both projects and additional data setup fees.

The conclusion is: send your analyst the original variables, instead of the derived ones! Following this simple rule will save RDC analysts time as well as your time and money. If you want to create derived variables and keep them on your permanent analytic dataset, just send us your programming code to create such variables. Your RDC analyst can either run the code while creating your analytic dataset or put your code into your folder along with the data so that you can create the derived variables yourselves.

Signed RDC Analyst

NCHS Survey Report: National Survey of Family Growth

Submitted by the NSFG team

The National Survey of Family Growth (NSFG) collects information on factors that help explain birth rates, such as contraception, infertility, sexual activity, cohabitation, and marriage.  The NSFG was first conducted in 1973 and was a periodic survey until the transition to continuous interviewing in 2006.  Originally, the NSFG only included a sample of women, but men were added beginning in 2002 and the survey is now a nationally representative sample of women and men ages 15-44.  The latest data release includes interviews conducted from 2006-2010 with 12,279 women and 10,403 men.  The public use data files are available for download from our website along with more information about the survey including the history, codebooks and User’s Guides for each survey, and key statistics on factors that influence family formation, growth, and dissolution based on the most recent survey

It is not hard to think about ways that the community where someone lives could influence some of the topics covered in the NSFG.  For example, you can probably think about how the characteristics of the larger community might influence who or when a person marries.  When researchers want to study how the context or community where someone lives potentially influences their fertility and family life they can use the NSFG contextual data files. The RDC hosts the contextual files for the 1995, 2002, and 2006-2010 NSFGs. 

For the 1995, 2002, and 2006-2010 data files, there are two contextual files for respondents; these correspond to the respondent’s address at two points in time, the date of interview and at the time of the most recent Census.  The contextual files include variables such as the total population, the proportion of men and women over 15 who have never been married, or the proportion of families with incomes below the poverty level measured at the state, county, tract, block group, and block level. 

The codebook for the 2006-2010 contextual file is available online.  Browsing through the codebook shows the wide variety of variables available to users of the RDC.  There is a wealth of information available at the county level including Hispanic origin and racial composition, employment and earnings, birth rates, education and school enrollment, migration patterns, crime, voting behavior, housing units, rurality/urbanity, marital status, healthcare providers and need, STD rates, and family planning providers.

One NCHS publication that may help researchers get started with using NSFG contextual data is the report, “Community Environment and Women’s Health Outcomes:  Contextual Data” by W.D. Mosher, L.P. Deang, and M.D. Bramlett.  This report uses the contextual data for the 1995 NSFG to look at how community characteristics are associated with childbearing, marital status, contraceptive use, breast-feeding and other outcomes.   It also offers detailed information on understanding the levels of data and conducting a multi-level analysis using SAS.

Another example of the use of the NSFG contextual data, by researchers outside of NCHS, is Magnusson, BM, Sabik, L, Chapman, DA, Masho, SW, Lafata, JE, Bradley, CJ and KL Lapane.  In their article “Contraceptive Insurance Mandates and Consistent Contraceptive Use among Privately Insured Women”1 they incorporated contextual factors, and also requested that the RDC link data from the Guttmacher Institute on state-level insurance mandates with the public use data file.  They took full advantage of the RDC by not only incorporating the contextual data in their analyses, but also requesting that additional data were added to the files by linking up information at the state level.  By using the RDC not only were they able to answer a research question on how insurance mandates at the state level influence individual contraceptive use, they were able to account for other characteristics of the counties (such as education, income, and crime) in their models.  

To learn more about how using the RDC can enhance your analysis of the NSFG, please visit the NSFG  or RDC website.  You can contact the NSFG team by emailing or calling 301-458-4222.

1 Magnusson, BM, Sabik, L, Chapman, DA, Masho, SW, Lafata, JE, Bradley, CJ and KL Lapane. “Contraceptive Insurance Mandates and Consistent Contraceptive Use among Privately Insured Women.” Medical Care 50(7):562-568. 2012.

Weigh In: Have you used NSFG data for your research?

Research Report: Choosing your Mode of Access

Dr. Paul Reiter is an Assistant Professor in the Division of Cancer Prevention and Control at The Ohio State University.  His research addresses cancer prevention and control through two main thematic areas, cancer screening and vaccination.  He is particularly interested in examining the determinants of engaging in these behaviors and designing programs to increase their use.

A great deal of my research during recent years has focused on HPV vaccination behaviors.  With this in mind, our research team wanted to examine HPV vaccination within the Appalachian region using data from the National Immunization Survey-Teen (NIS-Teen) datasets.  Because this research question had a geographical component to it, we needed to access restricted data through the Research Data Center (RDC).  We explored our options for accessing these data and opted for remote access through the ANDRE system. 

In preparation for using ANDRE, we prepared the public use datasets for analyses and sent them to our RDC analyst.  Our RDC analyst then merged in the needed restricted data and made the merged dataset available for analyses through ANDRE. Through simple descriptive statistics, we were assured that the datasets had been merged correctly.  To generate these descriptive statistics and run more sophisticated analyses, we simply had to submit analytic syntax to ANDRE through an FTP.  We would then receive the corresponding output in an email within minutes of submission.  The speed and ease of this process helped alleviate any concerns we had about conducting analyses through remote access.  ANDRE provided a highly efficient process that allowed us to complete our analyses in a brief period of time.       

Overall, we had an extremely positive experience with the RDC and ANDRE.  There were a few minor issues with ANDRE that required attention initially, but our RDC analyst and other staff helped resolve these issues.  This helped ensure that we were able to successfully complete our project in a timely manner. 

            RDC Thoughts:

The RDC remote access system, affectionately known as ANDRE, is a convenient access option for users who are not close to an RDC facility.  Others may find it more appealing and/or cost effective to work on site at an NCHS or Census RDC.  For example, researchers belonging to certain institutions may be charged reduced or no access fees when using a Census RDC. Sometimes an analytical plan dictates the mode of access as certain procedures or statistical packages may only be available for a specific mode of access.  Therefore, you should really consider the mode of access while writing your analytical plan.  It doesn’t seem like much but could save you time and money.

Weigh in: Are thinking about using the RDC? What modes of access are you considering and why? Are you currently using the RDC? Why did you choose your mode of access?

Welcome to Linking up with the NCHS RDC!

We are so excited about all the possibilities this blog has for the future.  For those of you who are new to the National Center for Health Statistics (NCHS) Research Data Center (RDC), we are the gatekeepers to some of the most sought after health data in the field today. Visit our websites to learn more about NCHS ( or the RDC ( 

Linking up with the NCHS RDC provides the research community with a variety of opportunities including making connections, creating research teams, improving upon best practices, or discussing idiosyncrasies of the surveys, confidentiality and disclosure. The primary writers of the blog will be RDC staff, but we will also feature RDC researchers or representatives from a variety of surveys.  If you are interested in writing a blog post or have a topic idea, email us at with the subject “Linking up with the NCHS RDC-Blog Idea”.  It is our aim to have a new blog each month.  So you never miss a post, subscribe to our blog by email ( or RSS feed. 

Please take the next few weeks to review our blog policy and privacy statements.  While we welcome your comments, we will not respond to every question or comment. We ask that you join the RDC Listserv (, so we may build a research community that is able to assist us in answering questions or comments offline. 

Share this blog with anyone who has ever expressed interest in NCHS surveys or the RDC.

Contact Us:
  • Centers for Disease Control and Prevention
    1600 Clifton Rd
    Atlanta, GA 30333
  • 800-CDC-INFO
    TTY: (888) 232-6348
  • Contact CDC–INFO The U.S. Government's Official Web PortalDepartment of Health and Human Services
Centers for Disease Control and Prevention   1600 Clifton Rd. Atlanta, GA 30333, USA
800-CDC-INFO (800-232-4636) TTY: (888) 232-6348 - Contact CDC–INFO
A-Z Index
  1. A
  2. B
  3. C
  4. D
  5. E
  6. F
  7. G
  8. H
  9. I
  10. J
  11. K
  12. L
  13. M
  14. N
  15. O
  16. P
  17. Q
  18. R
  19. S
  20. T
  21. U
  22. V
  23. W
  24. X
  25. Y
  26. Z
  27. #