eMKF: A New Tool for Studying Small Populations
Posted on byStatisticians at the National Center for Health Statistics recently developed a tool to help with the challenge of producing estimates of health conditions and other indicators of health for groups of people that may be less represented in the population.
The challenge of studying small populations
For researchers who study small populations , measuring health disparities among certain racial and ethnic groups, for example, can be a challenge. Small subpopulations result in small sample sizes, which can make it difficult to get statistically stable or reliable estimates. Depending on how small the group is, it may also present a possible threat of identification of an individual within a data set, also known as disclosure risk.
The National Center for Health Statistics (NCHS) publishes statistics on a variety of health measures and outcomes by population subgroups often from population surveys which sample a relative small number of people who then represent their subpopulation’s characteristics in the total population. However, NCHS’s ability to provide direct estimates for some small subpopulations has been limited because fewer people from those groups are able to be included in a survey’s sample. In those cases, the data collected may not come from enough people to represent everyone in the group. This then prohibits NCHS’ ability to make robust and reliable estimates for certain groups of people.
This has long been an issue for researchers at NCHS, and users of our data, as we investigate health outcomes for smaller population subgroups. It is important to understand differences in health outcomes across many different population subgroups to reliably measure and monitor health status, healthcare access and use, and health disparities in the United States.
To overcome this issue in part, often multiple years of data are combined so more data from a certain group can be pooled to increase the sample size. However, writing for results from multiple years to calculate these estimates means that the data may be less timely and relevant for public health use. Alternatively, some groups may be combined, such as combining Asian and Native Hawaiian and Pacific Islander populations together, to increase overall sample size but this results in a loss of detail and may obscure important differences among groups. For years, NCHS experts in statistical methods have been working to address this challenge.
A new statistical tool
Two reports released earlier this month describe a new tool that can help fill the small subpopulation data gap—
- Evaluation of an Enhanced Modified Kalman Filter Approach for Estimating Health Outcomes in Small Subpopulations
- Technical Guidance for Using the Modified Kalman Filter in Small-domain Estimation at the National Center for Health Statistics
The new tool is based on a statistical modeling technique called the Modified Kalman Filer (MKF), first released by RAND in 2011. The original MKF is a modeling procedure that uses information from other larger groups and timepoints to make better estimates for a small group. Model-based methods can improve the precision of estimates for small population subgroups and rare health outcomes. However, the original MKF is limited to equally spaced time points and linear trends.
The newly developed enhanced MKF (eMKF) by NCHS builds on the original. It also uses similar modeling procedures, including mixed and random effects models, to produce estimates of health outcomes among small subpopulations where direct estimates may be statistically unreliable. In addition, the new tool offers several advantages because eKM is—
- More transparent, with all code contained in a macro for SAS (commonly used statistical software)
- More flexible, allowing for random sampling variances for survey estimates
- More widely usable, with the capability of making estimates for nonlinear trends and unevenly spaced data points
Researchers at NCHS, CDC, and beyond can use eMKF to provide more reliable estimates of health measures and outcomes for small subgroups. These estimates can help identify health disparities among groups—a critical early step in developing evidence-based efforts to improve health.
In addition to publications on eMKF, researchers interested in learning more or using the tool can visit github.com/CDCgov/eMKF to get eMKF SAS code and examples.