Despite the number of years tabletop exercises have been utilized for preparedness, only unreliable formal evaluation systems have existed until the creation of the DHS’s/FEMA Homeland Security Exercise and Evaluation Program (HSEEP) in 2005. This has been problematic and is characterized in a statement by Department of Homeland Security’s Office of Inspector General in an Executive Summary regarding DHS Efforts to Address Lessons Learned written in the Aftermath of Top Officials Exercises:

“Since the first Top Officials exercises in 2000, neither a process for tracking weaknesses and how those weaknesses were resolved, nor a method for identifying and analyzing trends in corrective actions or significant lessons learned has been established. As a result, federal, state, local, and territorial agencies were unclear regarding the implementation of suggested improvements following preparedness exercises.”

FEMA developed a standardized program that includes common terminology for exercise design, development, conduct, evaluation, and improvement planning.

One of benefit from this program is support organizations can achieve objective assessments of their capabilities. Strengths and areas for improvement are identified, corrected, and shared appropriately prior to a real incident.  This can be a strength of the program but must still be monitored and utilized effectively.  

What’s the alternative?

Even in the post-HSEEP era, tabletop exercises that are no-fault or are not considered an HSEEP exercise, include evaluations that are conducted in the form of collecting qualitative metrics through the use of questionnaires, narratives, hotwashes (post exercise discussions), lessons learned, logs, checklists, or surveys. A lot of progress has been made over the last two decades to improve these techniques and forms, and great data has been gathered from these methods. But do these techniques and forms answer the question on whether the exercise was effective? Do they help identify any expected results from exercise objectives? Unexpected results? This analysis will be looked at further in the evidence section.

Because the point of tabletop exercises is to evaluate policies, plans, and procedures, it is vital that metrics are collected. Questionnaires or surveys are able to gather response from participants or players, but do they provide the end user any meaningful data that determines if the exercise was effective?

Common Evaluation

Common to all tabletop exercises is the debriefing portion of the event, commonly called “the hotwash”. A hotwash is an opportunity for participants to provide their inputs on how well the exercise went, what plans or procedures should be changed, lessons learned, and make commitments on changes they see are appropriate.  It is common for an exercise to be followed by an evaluation meeting and may include an after-action report stating findings of the evaluation team and the effectiveness of the exercise. It serves as the basis for planning future exercises, upgrading contingency plans, and taking corrective actions.

One of the greatest qualitative metrics gained during the debriefing or hotwash are the lessons learned. But these lessons learned could be irrelevant such as the lessons learned from the “Hurricane Pam” exercise conducted in 2005 prior to Hurricane Katrina, if they have not been properly implemented and communicated. From the failures in information gathering and sharing, progress has been made in evaluating exercise effectiveness with the implementation of HSEEP.

Next Steps Toward Identifying Effectiveness

Substantial progress has been made in evaluating tabletop exercises, but more could be done.  For one, it would be beneficial to gather more than just qualitative data. Quantitative data would be extremely beneficial for both private and public organizations. This data could be used to forecast improvement in response times and effort, identifying resource needs and training when gaps are identified, as well as aligning training budget concerns.

Secondly, the HSEEP program only applies to DHS-funded exercises and is not necessarily utilized for no-fault or exercises conducted outside of HSEEP.  Furthermore, HSEEP may not be the solution.  Exercises that are mandated to be evaluated such as HSEEP, have seen their share of problems which are highlighted in the Congressional Research Service Report for Congress titled “Homeland Emergency Preparedness and the National Exercise Program: Background, Policy Implications, and issues for Congress”.  Noted in the report was the following:

The identification of capabilities on which to build through a public AAR, as required by the HSEEP method, may raise challenges if exercise participants have not adequately exercised their plans, or are concerned about potential consequences as a result of negative evaluations. As a result, there may be incentives for some exercise planners to understate exercise objectives, overstate the extent to which those objectives are met, or to downplay or omit deficiencies that are identified.  Any of those approaches arguably undermines the effectiveness of the exercise as tools to prepare for an incident, or to evaluate an entity’s capacity to respond to an incident. 

This report does point out the possibility of a potential bias or incentive for planners and participants to not fully identify problems when an exercise is evaluated.  This indication of bias may provide explanation for the need of no-fault exercises which essentially allows all participants and players to work through the exercise in an open environment without penalty.  The report further indicated:

The HSEEP method does not provide common benchmarks or metrics to apply in the evaluation of an exercise.  Moreover, under the HSEEP method, exercises are typically evaluated by the same group that designs the exercise.  This approach, which extends beyond the National Evaluation Program to any entity that uses the HSEEP method, may be problematic if the evaluators fail to critically asses their own program.

So is no evaluation the solution? One reasonable answer appears to be allowing for self assessments to be conducted similar to what is done in no-fault tabletop exercises, but can these types of evaluations be done effectively?  To determine if this is effective, it is necessary to analyze the current forms of evaluation for no-fault tabletop exercises.

Resources