100 Million and Counting!Posted on by
When researchers at the National Institute for Occupational Safety and Health (NIOSH) set out to develop a tool that could improve the use of industry and occupational data from surveys, death certificates and other sources, we could only dream that our efforts would be this successful.
A Tool to Advance Research, and It’s Free
We started this journey in 2012, deploying the first version of the NIOSH Industry and Occupation Computerized Coding System (NIOCCS). NIOCCS is for researchers and others who may be interested in assessing how peoples’ jobs impact their health and safety. NIOCCS converts industry and occupation descriptive text obtained on surveys or from other records to standardized codes or numbers assigned to each industry and each occupation. These standardized codes are necessary for researchers to analyze industry and occupation data.
When NIOCCS launched, few tools were available to electronically code the industry and occupation text, referred to as “autocoding.” Researchers had to search in a book to get the industry and occupation codes associated with each survey response or record. This process of manual coding may take several minutes per response and provides much opportunity for error. NIOCCS was designed to code the industry and occupation text faster, more accurately, and make it available publicly for free.
In 2021, we made a major upgrade to NIOCCS to use machine learning to translate the text to standard codes or numbers. Machine learning uses sophisticated prediction models and allows systems to learn and improve from experience, which means it codes more accurately and more consistently.
Since these changes were implemented last year, NIOCCS is much faster than before, autocoding files containing tens of thousands of records in minutes. It is also easier to use and more secure now that login is managed by the CDC Secure Access Management Service (SAMS). This year, NIOCCS use has spiked tremendously, with over 50 million records coded in the last year alone!
Partners Using NIOCCS
NIOSH researchers work directly with several partners who use NIOCCS regularly to assist in their studies. Below are a few recent collaborations.
- COVID-19 American Red Cross Case-Control Study
Working in collaboration with other CDC centers, NIOSH researchers embarked on a large case-control study assessing SARS-CoV-2 seropositivity among American Red Cross blood donors. Donors complete a web questionnaire, which includes questions related to industry and occupation. The project team is using NIOCCS to autocode nearly 34,000 industry and occupation records for this study.
- The National Occupational Mortality Surveillance (NOMS) program
The NOMS program is a federal-state partnership between NIOSH, the National Center for Health Statistics (NCHS), and state vital statistics offices. NOMS uses information from death certificates to evaluate patterns in causes of death by occupation and industry. Funeral directors collect the decedents’ longest held industry and occupation text narratives, then transfer the information to NCHS and directly to NIOSH each week for autocoding. This streamlined process allows NIOSH to return coded data to jurisdictions approximately 8-12 weeks later for their immediate use. Since January 1st 2021, NIOCCS autocoded over 2,957,453 records for NOMS, which were submitted by 49 states and New York City.
Where We are Headed
We continue to look for ways to improve NIOCCS by:
- Retraining NIOCCS
NIOCCS now uses a machine learning model, so we “retrain” the system periodically. Retraining happens when NIOCCS encounters unfamiliar occupation and industry names among the submitted records. Retraining uses validated data that has been expertly coded and reviewed by NIOSH industry and occupation coding professionals to verify that the codes are correct. Retraining, if done periodically, creates a better machine learning model and more accurate output.
- Updating NIOCCS
We will update NIOCCS to use the latest industry and occupation coding schemes: Census 2018, North American Industry Classification System (NAICS) 2017, and Standard Occupational Classification (SOC 2018).
- Translating Spanish industry and occupation narrative text
Currently, NIOCCS is designed to translate only English industry and occupation data into standard codes. We hope to expand it to recognize and code Spanish industry and occupation text, as well.
Jennifer Cornell, JD, is a Technical Information Specialist in the NIOSH Division of Field Studies and Engineering.
Stacey Marovich, MHI, MS, is a Lead Health Informatics Scientist in the NIOSH Division of Field Studies and Engineering.
Amy Mobley, MEn, is a Health Communications Specialist in the NIOSH Division of Field Studies and Engineering.