Header Bar











A Survey-based System for Safety Measurement and Improvement

Brooks Carder, Ph.D., AdGap
Patrick W. Ragan, Aventis.
Abstract

Over a 10 year period of research we have studied the use of employee surveys for measurement of safety and as a diagnostic tool for improvement efforts. Our statistical studies indicate that our survey, which has evolved from the Minnesota Safety Perception Survey, is both reliable and valid as a measurement tool. The survey measures important components of the management system including: management’s demonstration of commitment to safety, education and knowledge of the workforce, effectiveness of the supervisory process, and employee involvement and commitment. We also have a considerable body of anecdotal evidence that the diagnostic element of the survey enables the development of effective action plans to improve safety performance. This evidence includes ratings of the process by plant managers who have used it.

Beginning in the early 1980’s the concepts taught by Dr. W. Edwards Deming (1982) began to revolutionize American business. The first prominent company to use Deming’s methods was Ford. The changes that occurred in Ford have led to its current position as the world’s premier automaker. It is easy to forget that Ford was in a very weak position in 1980. Today, while GM is a bit larger, Ford is far more profitable and has a much stronger balance sheet. When BMW looked for a model of lean manufacturing to emulate, they looked to Ford, not Toyota.

Deming applied scientific methods to business processes. He stressed the importance of "profound knowledge," the knowledge of variation, psychology, and the theory of systems. He applied this knowledge through a cycle of plan-do-check-act, often called the PDCA cycle, which Deming attributed to his mentor Walter Shewhart.

In the plan phase the objective is to study the system. Based on the findings, the do phase is initiated. This represents action on a limited scale. The effects of the action are checked, and if the action is effective, a large-scale action is initiated. The cycle then begins again. Since its adoption by Ford, these methods have become widely accepted in American manufacturing. GE, following the lead on Motorola, adopted the 6-sigma quality program in the late 90’s. Jack Welch, GE’s CEO, credits much of GE’s recent success to the adoption this process (Eckes, 2000).

Safety and Quality are One in the Same

Since 1989, we have been involved in the application of TQM principles to the problem of improving safety performance. (Carder, 1994; Ragan and Carder 1994)

All businesses produce something: a product, a service, or both. They also produce accidents. They usually produce them reliably. It is not possible to predict exactly when they will happen, but it is likely that you can predict how many will happen over a period of time, within statistical limits.

Accidents in and of themselves are evidence that the business does not have perfect control of the process that they are operating. Accidents are an important source of information about defects in business processes. It is unlikely that a unit with an unusually high frequency of accidents will be maximally effective in either productivity or quality.

Several years ago we studied thirteen plants within a large chemical company. Seven were selected because they had low accident rates and appeared to have good safety cultures. Six were chosen because they had frequent incidents and, apparently, weak safety cultures.

A staff group within the manufacturing organization had studied these same thirteen plants in the prior year, (along with all the company's other plants), to determine the effectiveness of each plant's management system in producing quality and productivity. The scatter plot in Figure 1 demonstrates the correlation between the two independent measures. Plants with weak manufacturing management systems had higher accident rates. The Pearson correlation coefficient is .76, which is statistically significant far beyond the .01 level.


Figure 1. Scatter plot of manufacturing rating against RAIR

Productivity is not achieved through strong motivation and cutting corners. It is achieved by a well-trained and well-led workforce performing according to well-designed plans and procedures.

This was deeply understood by Paul O’Neill, the current Secretary of the Treasury, when he took over Alcoa in 1987 (Arndt, 2000). O’Neill transformed Alcoa from a mediocre company to the world leader in the aluminum industry by focusing on safety. The performance of Alcoa’s stock during the period of O’Neill’s leadership rivals the performance of GE’s over the same period.

Application of TQM Principles to Safety

In 1992 began a project to assist a large chemical company in the implementation of a safety process based on Deming’s principles. Pat Ragan was the corporate Safety Director and Brooks Carder was a consultant. The first task was to introduce a measurement system, in order to accomplish the plan phase of the PDCA cycle. Most companies use two measures in their evaluation of the safety system: accident/incident rates and audits. Incident rates are important, but they are not always useful for process improvement. Without excellent investigation of causes, whichwerarely see, incident rates tell us that there is a problem, but they do not tell us what the problem is. If performance is outstanding, there are no incidents to investigate. Thus there is no data to assist in guiding performance improvement. Finally, minor incidents that lead to an OSHA recordable are more likely to be recorded than "near misses," which may portend a very serious incident in the near future.

Audits are on less firm ground. We are not aware of studies validating audit scores against performance. Polk (1987) surveyed 18 railroads regarding various elements of their safety programs. (This finding is summarized in Bailey and Peterson (1989)). He reports a negative correlation between the level of "reviews, audits and inspections" and accident rates. The audit process was never involved in Deming’s approach to quality. By exhorting companies to "cease dependence on mass inspection," Deming (1982) was pointing out that after-the-fact inspection, unless conducted in the manner necessary to create a control chart, did not help to build understanding of the work process.

What is needed is a measure of the safety management process. In 1993 we decided to use standardized surveys. Brooks Carder had some experience with these surveys and believed that they were both a valid and a useful measure of a safety system. Based on this experience, we worked with Charles Bailey to administer a modified version of the Minnesota Safety Perception Survey to over 6,000 employees in over 50 chemical plants. In 1994 we used their 74-question Minnesota survey, along with 12 questions written by our team. Our validation studies found that a number of the original questions were not valid in our test environment. Based on this study, we discarded these invalid questions. We also added questions to address some issues that are important to chemical companies: emergency response, process safety, and environmental protection. These topics were not addressed in any way by the original survey.

Our current HSE survey retains 41 questions from the original Minnesota survey and 57 questions developed by our team. All of our questions have been extensively validated. We are not aware of any other studies on the validity of these questions since the Minnesota survey was created.

The original Minnesota survey was purported to measure "20 factors influencing safety performance." As far aswecan ascertain, this was based on the opinion of the survey’s authors. We attempted to verify these 20 components by measuring the correlations between items. We were unable to verify that 20 factors were being measured in our review of the data. Therefore, we conducted an extensive factor analysis of the database.

Our own statistical factor analysis indicated that the original Minnesota Survey actually measured six factors. Rather than name the factors immediately, we conducted extensive focus group research with survey respondents in order to understand the factors. The groups were made up of survey respondents, managers, and hourly workers in the plant sites we had surveyed. The focus group participants were shown a set of questions that represented a single factor, and then they were asked to explain what variable they felt was being measured by those questions.

Based on the extensive focus group discussions we named the factors as follows:

  • Management’s demonstration of commitment to safety. This measures what management does, not what it says.
  • Education and knowledge of the workforce. Are workers properly trained to do their jobs, and do they receive proper safety training? Do they understand their jobs and how to work safely?
  • Effectiveness of the supervisory process. Does the company have standards for work, and are these standards enforced?
  • Employee involvement and commitment. Are employees involved in the planning process, and are they sufficiently committed to caution co-workers about unsafe practices?
  • Drugs and alcohol (fitness for duty). Is drug and alcohol use prevalent and tolerated?
  • Off-the-job safety. Does the company have an effective off-the-job safety program?

The additional factors created by our new questions include emergency preparedness, process safety, and environmental protection.

The first four factors appear to be relatively universal: they are somewhat similar to other measurement tools when attempts are made to analyze the critical components of the management system. Table 1 below depicts our factors along with the factors derived from three other sources: 1) Our own factor analysis of results from the application of a safety perception survey developed by the National Safety Council; 2) A factor analysis conducted by Coyle, et. al. (1995) on a safety survey that he developed; and 3) The factors identified by a group of managers at Dow chemical in the 1980’s that formed the basis of a plant’s "self assessment."



Table 1. Management system factors

The first four factors are present in all of the surveys except for the NSC survey, where we found no education and knowledge component.

Reliability and validity of our safety survey

An instrument is reliable if it gives a very similar measurement when the same thing is repeatedly measured. A tape measure is reliable enough to measure a room for laying carpet. Pacing it off is not. The standard method of assessing the reliability of a survey is to use a "split half" reliability test. It is assumed that all of the questions on the survey are measuring a similar underlying construct. The questions are randomly broken into two groups. For each respondent, a score is calculated on each half of the survey. Then a correlation coefficient is calculated for the two scores. The assumption is that a person who scores high on one half should score high on the other.

In fact, the split half reliability coefficient for our HSE Survey is in the range of 0.9. This is a very high degree of reliability for a survey (1.0 is perfect).

An alternative measure of reliability is consistency on repeated applications. We have administered the survey to many plants in successive years, giving us the opportunity to assess this measure of reliability. Figure 2 below is a scatter plot of scores on the survey administered in 1996 compared to the scores on the same survey from 1997.


Figure 2. Scatter plot of 1997 survey scores against 1996 scores for 39 sites

Each point represents one site. Again, there is a very strong correlation, with a Pearson r of 0.82. Therefore, the survey is highly reliable by this test as well.

Validity relates to whether the measurement is measuring the thing that you want to measure. We consider three types of validity:

  1. Face Validity. Do the questions "make sense?" For example, a question like "Did you receive adequate safety training?" has face validity., whereas a question like "Do you like apple pie?" has no face validity in measuring safety. However, it could conceivably have the next two kinds of validity.

  2. Predictive Validity. This is the ability of your measure to predict or correlate with other measures. The prototype here is the SAT test, which is an attempt to predict something about a student’s future college performance. SAT’s are used because different high schools have very different standards for grading. In fact, SAT tests do correlate with college performance. Brooks Carder spoke with the admissions office at , which has used the SAT as an admission criterion for over 40 years. They find that the correlation between SAT scores and academic performance is relatively weak, with a correlation coefficient on the order of 0.2 to 0.3. This means that a SAT score accounts for between 4% and 10% of the variation in college performance. The rest of the variation comes from something else. Nevertheless, the SAT remains the best predictor of academic performance that Yale has found.

    We would want our measures of the safety system and culture to correlate with some measure of loss. In industry, we use the recordable accident rate. If we have two sites with relatively equal risk, the one with the better score on our measuring instrument should have fewer accidents.

  3. Theoretical Validity. This is the ability of the measurement to provide us with sufficient understanding to intervene successfully. The IQ test has little or no theoretical validity. It does not tell us anything about the source of problems that might be detected, or what to do to improve an individual’s score. All three forms of measurement described above, (written surveys, in-person interviews and direct observation of behavior), can have theoretical validity. The proper use of each of these measures has led to the improvement of safety performance.

The survey questions clearly have face validity. It would be very surprising if they did not, since all of them were written by experienced safety professionals.

Predictive validity is assessed by correlating the survey score with another measure. In our initial validation study we used the 10 sites mentioned above, and correlated the survey score with the recordable accident rate, averaged over the three years prior to the survey. Figure 3 below is a scatter plot of this relationship.


Figure 3. Scatter plot of survey scores against RAIR for 13 sites

Here the correlation is negative, as is expected. Sites with higher (better) survey scores have lower recordable accident rates. The Pearson coefficient is 0.64, significant beyond the .01 level.

In addition to validating the overall score, we have also validated each of the individual questions. As previously mentioned, seven of the 13 sites we studied had excellent safety management systems, and six had safety management systems that were judged to be in need of improvement. For each question, the positive and negative answers of the excellent sites were compared with the scores of the sites in need of improvement. This difference should reach statistical significance, using the Chi square statistic, if a question is to be deemed valid. All of the questions in the survey have been validated at least two times in this manner. In 1995 we validated all of the questions on process safety and environmental protection. Every question that remains in our survey met this statistical criterion for validity.

It is worth noting that some of the questions in the original Minnesota survey did not validate in our studies. In fact, we tested their validity in the chemical industry and also in a company that manufactured copy equipment. We found 10 questions that did not validate in either setting. There are two obvious hypotheses about why this might be: 1) The initial validation studies were done around 1980, and times have changed. One of the questions that failed to validate asked whether "drug or alcohol use increases incident rates." In 1980, there could have been some doubt. Now virtually all respondents answer yes, making it impossible to differentiate between weak programs and excellent programs. 2) The initial studies were conducted at railroads. It may be that manufacturing operations are different.

Theoretical Validity

The most important question is whether the survey provides sufficient insight to enable effective process improvement. In our experience, companies that use the survey, and follow it up with employee focus groups and implementation of action plans, experience a significant reduction in their recordable accident rate. The process that we use for developing targeted actions based on the survey data is as follows:

  1. Survey results are fed back to the employees who took the survey.
  2. Employee focus groups are convened to further understand the results and assist in developing focused action plans.
  3. The action plans are reviewed by senior management.
  4. Actions are implemented with clear support from senior management.
  5. Results are measured, using performance measures and the survey. (We recommend repeating the survey every one to two years.)

Figure 4 below shows a control chart of recordable accidents before and after the application of the survey process for three companies that have used the survey and engaged in the process detailed above. In each case there is a reduction in the RAIR that is statistically significant.



Figure 4. Control chart of recordable accidents before
and after survey-based improvement process

Our experience is that organizations that go through the process outlined above typically experience a reduction on the RAIR in the range of 25-50%. Other workers have reported similar results. O’Toole (in press), working with a chemical company, used a portion of our survey to develop an action plan. Following the implementation of the plan, he observed a reduction in lost time rates of more than 50%.

The problem with using such studies to demonstrate the efficacy of the process is that it is difficult to assert experimental control. An obvious method would be to take a large company and randomly assign half of the plant sites to a survey-based improvement program and reserve the other half as a control group. This is unlikely to be achieved in reality. We have two additional sources of data, however, that substantiate our assertion that this survey-based improvement process is effective in improving safety performance:

  1. The results of improvement efforts are specific to the survey diagnosis, not general. We conducted a limited version of the survey at a pipeline company. Our finding was that management commitment to safety was perceived as weak, and that recognition of employees for their contributions to safety was lacking. Based on this finding we worked with the company to implement a recognition program and to increase management’s visibility in supporting safety. Eight months later we re-surveyed their employees. The changes in survey scores are reflected in Figure 5 below.


    Figure 5. Changes in 4 survey components as a result of survey-based intervention

    The observed improvements are specific to the areas targeted by our intervention. Furthermore, the intervention was effective in improving performance. This company experienced a long-lasting reduction in RAIR in excess of 50%.

  2. Plant Staff View the Survey as a Useful Tool.

    While we have considerable circumstantial evidence that the survey process is effective in improving safety performance, the opinions of operational management are relevant in this matter. To ascertain line management’s opinion of the effectiveness of the process, we surveyed a population of 22 plant managers who had used the survey annually for three years.

    Figure 6 below shows their opinion on the usefulness of the survey for the company.


    Figure 6. Opinions of plant managers regarding survey benefit

Most of the respondents felt the survey was useful. Verbatim comments included the following:

"(The survey) allowed the site to prioritize safety programs to meet the employees concerns. For example in the first year we reemphasized emergency response, then off site safety. In later years we have emphasized Management of Change and supervisory safety processes."

"(The survey) highlights latent problems of which we were unaware.

"(The survey offers) identification of areas needing improvement from the employee viewpoint."

"It did help for us to understand what was behind changes to the survey."

"It provided a basis for developing the 1997 Safety Programs."

"(The survey yields) open discussion of perception of what's good and what's not so good in the safety program."

"Some good input was gained on several significant safety issues in the follow-up discussions."

"(The survey helps) to prioritize actions and empower groups to develop management systems in those areas."

"(The survey) builds safety awareness. Improves safety culture. Provides input from people other than mfg. Identifies problem areas. Can help to improve process safety."

"(The survey has employee) buy in to "Safety" from non-management employees."

"(The survey is a) constant reminder of safety; assists with safety initiatives/objectives/awareness."

"(The survey is a) good overall method to identify weaknesses and adapt action plans."

"(The survey) helps get input so that HSE processes can be improved. Just conducting the survey and feeding back results communicates a level of commitment to HSE excellence."

"It gives us feedback to see if our HS&E efforts are going in the right direction."

"(The survey is) the best measurement tool currently available."

Conclusions

Our work suggests that our survey process provides a reliable and valid metric of the management system. As such, it can provide an effective method of evaluating quality and effectiveness of the safety management system. In contrast, there are many potential pitfalls arising from the use of accident rates as a measure of the system including inconsistency in reporting, differences in the inherent risk of the work, failure to identify events which are not recordable accidents but which have substantial potential for disaster, and high variability with small populations. There is ample data confirming that our survey provides a diagnostic framework for the development of an effective safety improvement effort.

In fact the survey-based improvement process is a powerful tool for organizational change that goes far beyond the arena of safety. Following three years of work at the large chemical company, the president remarked that this process had created more positive change in the company’s culture than a multi-million dollar engagement by a major consulting firm that took place at the same time.

References

  • Arndt, M. How O'Neill Got Alcoa Shining. Business Week, February 5, 2001. Deming, W. Edwards. Out of the Crisis. MIT Center for Advanced Engineering Study, Cambridge, Mass., 1982.
  • Bailey, C. W. and D. Petersen. Using safety surveys to assess safety system effectiveness. Professional Safety, 2; 22-26: 1989.
  • Carder, B.. Quality Theory and the measurement of safety systems. Professional Safety, 23-28, 1994.
  • Coyle, I.R., Sleeman, S.D. and Adams, N. (1995). Safety climate. Journal of Safety Research, 26, 247-254
  • Eckes, G. General Electric's Six Sigma Revolution: How General Electric and Others Turned Process Into Profits, John Wiley & Sons, New York, 2000.
  • O"Toole, M. The Relationship Between Employees’ Perceptions of Safety and It’s Management Culture. Journal of Safety Research (in press).
  • Polk, J. F. Statistical Analysis of Railroad Safety Performance, 1977-1982. Final Report of Contract DTFR 53-82-X-0076, Federal Railroad Administration, 1987.
  • Ragan, P. T. and B. Carder. Systems Theory and Safety, Professional Safety, 22-27, 1994.

Return to Article List