“Prediction is very difficult, especially if it’s about the future.” — Niels Bohr
- Text message to chemical plant manager: Chlorine leak expected on line 2 tomorrow. Inspect and repair.
- High priority email and automatic call to coal mine superintendent: 83% chance of roof fall on section 4. Evacuate immediately and take corrective actions.
- Monthly notice to OSHA regional administrator: HIGH PRIORITY INSPECTION ROSTER: Firms listed below have a greater than 80% probability of violations reflecting hazardous conditions requiring mitigation.
Science fiction or the next logical step in workplace prevention?
Predictive analytics—the application of recently-developed analytic techniques to build, test, refine, and apply algorithms in an effort to, as Eric Siegel famously wrote, “predict who will click, buy, lie, or die” is a burgeoning field [Siegel 2013]. The availability of “big data” [Mayer-Shoneberg 2013]—vast quantities of information characterized by its volume, velocity, and variety—and the computing power to process the information rapidly has opened up innumerable possibilities for both private sector and public benefit.
Predictive analytics [PA] is at work each time Amazon informs a customer they may be interested in a particular product; each time Netflix recommends a movie; when life insurance companies risk-stratify and set rates for potential policy holders; and when some credit card companies target potential customers with specific incentives to acquire their card. Harrah’s casino has been using predictive analytics to promote customer loyalty. UPS is said to employ predictive analytics to analyze sensor data to target fleet maintenance more economically and to adjust delivery routes to reduce traffic-related delays and fuel consumption.* Software predicting whether airline ticket prices will rise or fall in the next week can guide purchasing decisions on line.
The public sector is also benefiting from predictive analytics [Davenport 2008]. Analytics are being employed by an increasing number of police departments, including those in Los Angeles and Santa Clara, CA, to make specific patrol assignments in order to improve crime prevention. While this raises the specter of the anticipatory (and occasionally misdirected) “murder prevention” arrests portrayed in the 2002 movie Minority Report—concerns about the current approaches to “predictive policing” have been limited.
Health care, with the exploration and adoption of “personalized medicine” is employing PA to tailor health screening and modify treatments to those most likely to succeed in a specific individual. And some health insurers are using PA to identify potential high-need/high-use beneficiaries in order to provide more intensive home-based services.
The political process is diving in as well. Electoral strategists are mining data to focus and reduce the cost of their “get out the vote” efforts in order to concentrate on people most likely to support their candidate.
Safety and Heath Applications
NIOSH has been exploring the potential application of predictive analytics and related approaches to reducing risk of death, injury, and disease from work. Clearly, there is tremendous potential for improved prevention if accurate predictions of injury and disease probability are possible. It seems likely that if injuries can be predicted accurately, they can be prevented. Work environments can be modified, maintenance performed, and workers trained, and improved protections offered. Multi-site employers could focus on their efforts on sites where injury or death is most probable. It is also likely that the agencies that inspect workplaces and enforce protective regulations—OSHA and MSHA—would be more effective and efficient if they could direct their efforts to workplaces where the risk of injury or death is high.
While traditional epidemiology searches for the determinates of disease and injury over time in populations, PA focuses on the prediction of events or effects in individuals or other affected “units” (such as particular production lines or workplaces) during a specific time window. Epidemiology is used to establish exposure—response relationships using careful measurements of both exposures and health effects while controlling for population variability. PA often uses available historical data reflecting the endpoint of interest, for example a five year history of mine injuries reported to MSHA, then divides the data into a training and test set. Using “machine learning” approaches, an algorithm is developed from the test set using a wide range of available, potentially relevant data, to fit the test set data. That algorithm is then applied to the test set of data to assess how well the algorithm predicts the results, and then is further refined if necessary. When a good algorithm is developed, it can be applied as new data are gathered. In this example, an algorithm that identifies mines where serious injuries are likely to occur could stimulate operators to adopt preventive practices and also help direct mine inspectors.
The data potentially relevant to predicting injury or disaster are plentiful: prior safety experience; a worker’s age and time in a specific job; time during a shift and hours worked during the prior day, week, or month; geographic location; how recently a workplace inspection occurred, and what the results were; season; enterprise profitability; the presence of an injury prevention program; union representation of the workforce; and on and on.
Barriers to Implementing PA in the Workplace
But there are many barriers to employing predictive analytics to improve occupational safety and health. Knowledge, skills, and attitudes of employers or workers may stand in the way. Some large workplaces have large quantities of relevant data available but do not know what to do with it. Others may have the potential to analyze and react to their data, but they may lack the motivation to apply analytics beyond sales and marketing. Also, small and medium sized enterprises may not have either the data relevant to their specific worksite or the technical resources to utilize what they do have. There may be privacy concerns that limit access to relevant data—concerns that the data will be used for other less socially responsible purposes. And some employers may believe that they are doing just fine without the benefits of prediction—that the potential benefit isn’t worth the effort.
Prediction depends on the availability of information of adequate quality and consistency; access to trained analysts who have the tools to do their work; and an ability to frame questions and identify situations that are likely to benefit from prediction. That said, many of the thorniest problems faced in occupational safety and health may challenge the most sophisticated predictive analytic approaches. Low probability, high impact events such as oil rig failures and airplane crashes may be so rare as to defy prediction. But it is likely that other workplace problems from long-haul truck crashes to chemical line leaks could be anticipated and prevented by employing existing sensor technologies and predictive analytics.
NIOSH is aware of at least one consulting firm that is marketing prediction to reduce workplace injury; others who are working to identify “leading indicators” that can be measured and are associated with good OSH performance; and a handful of others who are employing PA in private industry and academia with an OHS/prevention focus. We would appreciate hearing from you: (1) What circumstances could benefit from the application of predictive analytics? (2) Do you know of situations where predictive analytics is being used to improve workplace health and safety? (3) What are the barriers to adoption of advanced analytics for OHS? (4) What can be done to overcome these barriers?
We hope to hear from many of you and will look forward to sharing what we learn.
Gregory R. Wagner, M.D.
Dr. Wagner is Senior Advisor to the NIOSH Director
*References to products or services do not constitute an endorsement by NIOSH or the U.S. government.
Sources & Resources
Ayres I . Super Crunchers: Why Thinking-by-Numbers Is the New Way to Be Smart. Bantam Books, New York, NY.
Davenport TH & Harris JG . Competing on Analytics: The new science of winning. Harvard Business School Press, Boston, MA.
Davenport TH & Jarvenpaa SL . Strategic Use of Analytics in Government. IBM Center for the Business of Government, Washington, DC.
Mayer-Schoneberger V & Cukier K . Big Data: A revolution that will Transform How We live, Work, and Think. Houghton Mifflin Harcourt Publishing Co, New York, NY.
Siegel E . Predictive Analytics: the Power to Predict Who Will Click, Buy, Lie, or Die. John Wiley & Sons, Inc., Hoboken, New Jersey.
Silver N . The Signal and the Noise: Why So Many Predictions Fail—But Some Don’t. The Penguin Press, New York, NY.