Making Industry and Occupation Information Useful for Public Health: A guide to coding industry and occupation text fieldsPosted on by
This is the second of two blogs in the series “COVID-19 Surveillance among Workers: What we know and what are we doing to learn more”. To learn more about occupation and industry data collection for acute infectious diseases, see the first blog Collecting occupation and industry data in public health surveillance systems for COVID-19.
Information about a person’s occupation and industry can be incredibly useful for determining if certain jobs or kinds of businesses put people at a higher risk of illness or injury. This is why many surveys and case report forms ask about a person’s job.
There are thousands of jobs and industries. When a person is asked “what is your occupation?” they provide a description, such as “Mechanical Engineer” or “Elementary School Principal.” The same is true for industry; when asked what industry they work in, people will reply with something like, “auto part manufacturing,” “insurance sales,” or “hospital.” These descriptions need to be assigned a numeric code for public health researchers to analyze the data. These numeric codes are specific to each type of occupation or industry:
- “Mechanical Engineer” has a specific occupation code, as does “Elementary School Principal”
- “Hospital” has a specific industry code, as does “Pediatrician’s Office”
However, people describe jobs that have the same tasks in many ways, for example “IT specialist” and “Information Technology Specialist” refer to the same occupation. Though the description is slightly different, “IT specialist” and “Information Technology Specialist” have the same occupation code, as they refer to the same occupation.
Assigning codes manually to occupation and industry descriptions is time-consuming, so NIOSH automated this process to make it quick, easy, efficient, and accurate. The automated process may be done in two ways:
- Coding a batch of multiple records after completing data collection
- Coding one record at a time, during data collection
Code Multiple Records After You Collect the Data in a Batch
The NIOSH Industry and Occupation Computerized Coding System (NIOCCS) is a free, web-based software application that translates industry and occupation text to standardized industry and occupation codes. NIOCCS codes large batches of industry and occupation data you have already collected.
How do I use NIOCCS?
1.Go to the NIOCCS page. If you have a large number of records to code, you’ll need to register for a NIOCCS account. If you have only a few records, you can enter them to get the standard code without having to register for an account.
Provide the industry and occupation text you need coded. If you have only a few records, you can enter these without uploading a file. If you have a lot of records, it’s fastest to upload the information in a file format. Files uploaded into NIOCCS must be in a standard .txt file format delimited by a Tab or Pipe character (|) and must contain at least:
- Record identifier (ID)
- Industry text
- Occupation text
Each record submitted must have a value in the ID field and must have at least one value in either the Industry Title or Occupation Title – an example of what the file could look like is shown here:
2.Code using NIOCCS. NIOCCS automatically codes most of the records but data in some records cannot be coded. In this case, codes are assigned by a person, especially if there are problems with the data like misspellings or incomplete descriptions. NIOCCS provides computer-assisted coding features to help you to select appropriate codes for those records not automatically coded.
3. Download your results. Once the coding process is complete, you can view the output of a single code, or download the coded file, which includes the original uploaded data or input data fields, plus standardized Census, NAICS, and SOC industry and occupation codes.
For more details on using NIOCCS, see the NIOCCS User Documentation.
Code Each Record as You Collect the Data: Two Options
1. Use Epi Info to code industry and occupation data as you go
You can now code your industry and occupation data as you collect it, in real-time using Epi Info7.2.4. This is available in Epi Info using a new, NIOSH-developed function that can be easily added to a data entry form using a template.
How do I use Epi Info 7.2.4?
CDC’s Epi Info is available to anyone, and includes a suite of software tools designed for public health practitioners and researchers.
Epi Info provides:
- easy data entry form and database construction
- customized data entry experience
- data analyses with epidemiologic statistics, maps, and graphs
Epi Info version 7.2.4 allows users to code free-text industry and occupation data to Census standardized industry and occupation codes. During data collection, you can use Epi Info to collect AND code industry and occupation information. The Epi Info software can be downloaded from the Epi Info website.
Once downloaded, no internet connection is required to run the Epi Info program, and no data are transferred outside of Epi Info during the coding process.
How does the Epi Info Industry and Occupation Template work?
- Add the Industry and Occupation Code (IOCode) field template to your Epi Info form. Go to the Epi Info website and select “Create Forms.”
In the Form Designer, click “New Project” or open an existing Epi Info project.
To add the industry and occupation template, look on the left, under Project Explorer. Scroll down to “templates.” Click on IOCode, drag and drop the IOCode field onto the form canvas.
- Once you have added the IOCode field to your Epi Info project, you can begin entering your data. Data entry is done in Epi Info’s Enter Data option. From within “Enter Data”, open your project that includes the IOCode field. Click the New Record button in the toolbar to start a new record.Enter the description of the subject’s occupation in the Occupation field and the industry description in the Industry field, then click on the Get I/O Codes command button.
- A dialogue box entitled “Get Industry and Occupation Results” will show the occupation and industry descriptions you entered, as well as the occupation and industry Census codes and corresponding industry title and occupation title assigned by the auto-coder. The dialogue box also contains lists of all the possible occupation and industry codes identified by the auto-coder, ranked in descending order of fit.
- Though the system lists the codes with the best fit first, you may select any industry and occupation codes presented in the lists. You may also modify your search by typing directly into Occupation or Industry text boxes in the dialogue box. The code assignment and the list of possibilities will automatically update.
- When you feel the assigned occupation and industry codes are the best fit, click OK. The selected occupation and industry codes and corresponding industry title and occupation title assigned by the auto-coder will automatically copy onto the data entry form, as will any changes you made to the free-text occupation or industry descriptions.
- Continue entering data by tabbing or clicking the other data fields. To start a new record, click the New Record button in the toolbar; the current record will be saved automatically. To save the current record without starting a new record, click the Save button in the toolbar.
2. For those who are especially tech savvy, use the newly released NIOCCS auto-coding service handler to code industry and occupation data as you go.
This option is for software developers and web coders. We recently released an industry and occupation auto-coding service handler on the NIOCCS website that can be accessed via a web call. This feature uses machine learning to translate industry and occupation text into the corresponding industry and occupation codes. It allows you to incorporate real-time coding of industry and occupation data into any data collection platform as you collect the data. This service handler does not require a login and can be called from any application requiring only an internet connection.
The NIOCCS web-based industry and occupation auto-coding service handler is located at:
The format of the service call is:
The service handler accepts the following input parameters:
|I||Industry text for the data to be coded||Required|
|O||Occupation text for the data to be coded||Required|
|N||Number of candidates returned. The default is 1, returning the top coding possibility based on the industry and occupation auto-coder’s training.||Optional|
|C||Flag that determines whether the service returns candidates from the Census Industry and Occupation 2012 coding scheme (“c=1”) OR candidates from the North American Industry Classification System (NAICS) 2012/Standard Occupational Classification (SOC) 2010 coding schemes (“c=0”). The service defaults to the NAICS 2012/SOC 2010 coding schemes.||Optional|
Matthew R. Groenewold, PhD, is an epidemiologist in the NIOSH Division of Field Studies and Engineering.