Evaluating Tabletop Exercise Effectiveness

Evaluating Tabletop Exercise Effectiveness

Despite the number of years tabletop exercises have been utilized for preparedness, only unreliable formal evaluation systems have existed until the creation of the DHS’s/FEMA Homeland Security Exercise and Evaluation Program (HSEEP) in 2005. This has been problematic and is characterized in a statement by Department of Homeland Security’s Office of Inspector General in an Executive Summary regarding DHS Efforts to Address Lessons Learned written in the Aftermath of Top Officials Exercises:

“Since the first Top Officials exercises in 2000, neither a process for tracking weaknesses and how those weaknesses were resolved, nor a method for identifying and analyzing trends in corrective actions or significant lessons learned has been established. As a result, federal, state, local, and territorial agencies were unclear regarding the implementation of suggested improvements following preparedness exercises.”

FEMA developed a standardized program that includes common terminology for exercise design, development, conduct, evaluation, and improvement planning.

One of benefit from this program is support organizations can achieve objective assessments of their capabilities. Strengths and areas for improvement are identified, corrected, and shared appropriately prior to a real incident.  This can be a strength of the program but must still be monitored and utilized effectively.  

What’s the alternative?

Even in the post-HSEEP era, tabletop exercises that are no-fault or are not considered an HSEEP exercise, include evaluations that are conducted in the form of collecting qualitative metrics through the use of questionnaires, narratives, hotwashes (post exercise discussions), lessons learned, logs, checklists, or surveys. A lot of progress has been made over the last two decades to improve these techniques and forms, and great data has been gathered from these methods. But do these techniques and forms answer the question on whether the exercise was effective? Do they help identify any expected results from exercise objectives? Unexpected results? This analysis will be looked at further in the evidence section.

Because the point of tabletop exercises is to evaluate policies, plans, and procedures, it is vital that metrics are collected. Questionnaires or surveys are able to gather response from participants or players, but do they provide the end user any meaningful data that determines if the exercise was effective?

Common Evaluation

Common to all tabletop exercises is the debriefing portion of the event, commonly called “the hotwash”. A hotwash is an opportunity for participants to provide their inputs on how well the exercise went, what plans or procedures should be changed, lessons learned, and make commitments on changes they see are appropriate.  It is common for an exercise to be followed by an evaluation meeting and may include an after-action report stating findings of the evaluation team and the effectiveness of the exercise. It serves as the basis for planning future exercises, upgrading contingency plans, and taking corrective actions.

One of the greatest qualitative metrics gained during the debriefing or hotwash are the lessons learned. But these lessons learned could be irrelevant such as the lessons learned from the “Hurricane Pam” exercise conducted in 2005 prior to Hurricane Katrina, if they have not been properly implemented and communicated. From the failures in information gathering and sharing, progress has been made in evaluating exercise effectiveness with the implementation of HSEEP.

Next Steps Toward Identifying Effectiveness

Substantial progress has been made in evaluating tabletop exercises, but more could be done.  For one, it would be beneficial to gather more than just qualitative data. Quantitative data would be extremely beneficial for both private and public organizations. This data could be used to forecast improvement in response times and effort, identifying resource needs and training when gaps are identified, as well as aligning training budget concerns.

Secondly, the HSEEP program only applies to DHS-funded exercises and is not necessarily utilized for no-fault or exercises conducted outside of HSEEP.  Furthermore, HSEEP may not be the solution.  Exercises that are mandated to be evaluated such as HSEEP, have seen their share of problems which are highlighted in the Congressional Research Service Report for Congress titled “Homeland Emergency Preparedness and the National Exercise Program: Background, Policy Implications, and issues for Congress”.  Noted in the report was the following:

The identification of capabilities on which to build through a public AAR, as required by the HSEEP method, may raise challenges if exercise participants have not adequately exercised their plans, or are concerned about potential consequences as a result of negative evaluations. As a result, there may be incentives for some exercise planners to understate exercise objectives, overstate the extent to which those objectives are met, or to downplay or omit deficiencies that are identified.  Any of those approaches arguably undermines the effectiveness of the exercise as tools to prepare for an incident, or to evaluate an entity’s capacity to respond to an incident. 

This report does point out the possibility of a potential bias or incentive for planners and participants to not fully identify problems when an exercise is evaluated.  This indication of bias may provide explanation for the need of no-fault exercises which essentially allows all participants and players to work through the exercise in an open environment without penalty.  The report further indicated:

The HSEEP method does not provide common benchmarks or metrics to apply in the evaluation of an exercise.  Moreover, under the HSEEP method, exercises are typically evaluated by the same group that designs the exercise.  This approach, which extends beyond the National Evaluation Program to any entity that uses the HSEEP method, may be problematic if the evaluators fail to critically asses their own program.

So is no evaluation the solution? One reasonable answer appears to be allowing for self assessments to be conducted similar to what is done in no-fault tabletop exercises, but can these types of evaluations be done effectively?  To determine if this is effective, it is necessary to analyze the current forms of evaluation for no-fault tabletop exercises.

Resources

Analyzing Tabletop Exercise Effectiveness

Analyzing Tabletop Exercise Effectiveness

To assess the current forms of evaluating the effectiveness of no-fault exercises and offer alternatives, several individuals with a great deal of experience participating in and planning tabletop exercises were interviewed.  This cadre includes former senior representatives with the Department of State, a Senior Policy Advisor to the White House and Department of Energy, a Deputy Under Secretary of the National Nuclear Security Administration, an Exercise Director with the Federal Bureau of Investigations, a former Navy SEAL, an Assistant Administrator for Protective Services and Security with the National Aeronautics and Space Administration, a hospital Emergency Manager, and a Senior Exercise Planner.  These interviews yielded a broad and detailed view on how tabletop exercises are currently evaluated and how they could improve.  Moreover, over 100 articles, governmental committee and subcommittee notes/reports, journals, after-action reports, and thesis were reviewed to gain a strong knowledge of how tabletop exercises are evaluated and whether they are effective.

Are qualitative assessments good enough?

Although no numeric rating may result from qualitative assessments and responses, one can certainly provide solid qualitative evidence that an exercise was effective through subjective post exercise evaluations/critiques provided by exercise participants.  Though it can be subjective, the individuals interviewed also provided examples of how one can determine that an exercise was effective solely through qualitative response.  Common qualitative examples discussed in the interviews as well as actual results from tabletop experience include:

  • Team-building and familiarity among response assets and leadership –The tabletop exercise provides an opportunity for first responders and follow-on response to meet one another, sometimes for the first time, and get to know/build trust among one another. “Almost impossible to measure, but a tabletop exercise is invaluable because it is the relationships built between responders in an emergency.  [A tabletop exercise] builds a trust between response.”

  • Knowledge gained of roles, responsibilities, and assets among responding parties – Deputy Under Secretary for the National Nuclear Security Administration Dr. Steve Aoki stated that taking part in tabletop exercises helped him in response guidance to Fukushima.  The exercises he participated in provided him with the knowledge on the various response assets available and what assets could be called upon during a crisis/disaster.  Additionally, the tabletop provided him with a venue to work through various scenarios.  Furthermore, the Assistant Administrator at NASA, Mr. Mahaley, stated that he took part in a tabletop exercise that involved him contacting the White House during a disaster.  This experience prepared him for an actual call that was needed while he was acting Director of Security for Energy during the blackout of 2003 that impacted much of the Northeast United States.  He stated that experience provided him the opportunity to work through how a phone call to the White House would take place and understand who needed to be included in the call.

  • Post-exercise lessons learned – things learned that were otherwise not know prior to the exercise.  “For the after action review to be effective, the opportunity to incorporate recommended changes to site response plans and procedures should be a goal.”A gap in a current security plan or procedure or a lack of understanding of a particular substance/organism often is identified through the course of tabletop exercise.  This often results in a change in a plan/procedure or a group of people being more comfortable with response.

  • Knowledge of a particular threat – i.e. group, source/material, attack.  As stated from the conclusion of Maryland’s pandemic influenza preparedness exercise – It [the tabletop exercise] served to engage the emergency response community and address the issues of incident command and how pandemic planning fits with the “all hazards” approach.  The exercise also educated key partners and stakeholders, through an experiential approach, about the potential severe consequences of pandemic influenza, and it provided a forum to “drill down” beyond the current state plan and identify additional critical local planning activities that are needed.  Instructive insights and lessons were gained from the exercise that should bolster further planning efforts in Maryland, not only for pandemic influenza, but also for bioterrorism and other public health disasters.

  • Exercising plans in place – a tabletop exercise provides a venue for response to actually practice the plans and procedures in place to ensure they fully understand said plan/procedure and/or response to a disaster. Mr. Mahaley stated “You do what you are trained to do.  In real life, you are going to react how you are trained.  In my 40 years of experience, tabletop exercises provide the most effective form of training.”

This qualitative information is extremely vital and shows a tabletop exercise is effective, but this information is not always easy to gather.  In almost every interview that was conducted, a common theme regarding the best way of obtaining great qualitative response was if the exercised remained no-fault or non-attributional, allowing an open, honest environment.  The reasons stated included that assets are more likely to admit faults, vulnerabilities, or lack of understanding or a shortfall in a plan/procedure if they are not worried about their job. This makes sense considering that participants may feel more comfortable speaking if they are not being graded.  So, if a no-fault tabletop exercise yields the best qualitative responses, can it also provide quantitative results to determine effectiveness?

Quantitative assessments

Raw, not well developed, quantitative assessments currently exist in the tabletop community but they are not well known and there is no standard.  It would be useful to have standardized quantitative assessments to assist public and private organizations determine if the money being spent by their organization is going to good use and the tabletop exercise is worth attending.  With a lack of available quantitative metrics, it is prudent to look at ways to quantify the results of a tabletop exercise to compliment the qualitative data.  Furthermore, for the purpose of this paper, it was stated that exercises may be more effective being non-attributional, so we will also mull over this as the type of tabletop exercise being considered.

Suggested Metrics

Three forms of quantitative assessments should be considered to assist government agencies and private organizations with determining effectiveness for a no-fault tabletop exercise.  These include a pre/post test combination to help identify the percentage of improvement, a numeric count of observations during and post exercise, and a rubric as an assessment tool.

Conducting a pre and post test among players and observers (observers typically consist of other invited responders not sitting at the player’s table) is a way to gauge a level of improvement in understanding, knowledge, and collaboration.  Participants would take a test to indicate their understanding of response to a disaster, level of knowledge on the particular threat, and how well they know who would be responding/in charge of a particular incident.  Then following the tabletop exercise, the participants would take the same test and the results would be compared between the two.  From that, a level of improvement could be gathered providing some quantitative gauge of exercise effectiveness.

Another potential way to gather quantitative data from a tabletop exercise could be in the form of counting the number of observations either during an exercise or post exercise.  During a no-fault exercise, an unbiased observer could be included not to grade or place fault, but to instead count the number of observations that a participant learned something, a vulnerability was identified, a gap in a plan or procedure was identified, or an agency stated they were unaware of a particular response or response asset.  The observations could then be sorted and tallied.  This could also be done post- exercise counting the number of changes made to plans, policies, or procedures, as well as anything else that may have resulted from the exercise experience.

The final suggestion that should be considered is an original contribution from the research for this paper.  This rubric was designed considering the great number of tasks and objectives that may be included in a tabletop exercise. Table 3 is shown in its full capacity below.

Scoring Effectiveness

To utilize the rubric, exercise evaluators would read the tasks listed on the left and employ the descriptions listed beside the tasks to determine a score for the specific task or question listed.  Each task would receive a 1 (lowest quality), 2 (average quality) or a 3 (highest quality) based on how well the tabletop exercise fulfilled the task.  The culmination of the tasks/questions listed in the rubric should fulfill the purpose and goals of the exercise.  Hence, a score toward the higher end of the max scoring should indicate exercise effectiveness.  In the example in Appendix 1, the minimum score would be a 20 with a maximum value of 60.  Considering the range, a median score of at least 30 may indicate an effective exercise, but it would be up to the designer of the rubric to identify the threshold based on the number of tasks/questions listed and their respective values.   One other suggestion to the rubric might be assigning a greater weighted value range for tasks/questions that have great importance.

Who should complete the rubric?

In this assessment, it would make sense to utilize two groups to fill out this form following the exercise.  In the first group, the players should be considered the primary responder to the rubric.  They will be the main focus and it will be their response to the exercise scenario that will be gauged.  The rubric could remain anonymous since the exercise in no-fault and will not be a factor in gauging the results.  Additionally, the tabletop exercise remaining no-fault may foster more honest response from the players.  The second group that would complete the rubric consists of site agents that have expert knowledge and experience with tabletop exercises.  They may be individuals that assist in the setup and reality design of the exercise, but are not a part of the exercise planning/facilitation team and have no stake in how well the exercise performs.  Ideally this would be someone who can observe the exercise, but one who is not a player or providing response during the exercise.  Lastly, a combination of the two may be the best model.  By obtaining a score from both the players and the site, one could compare the averages between the two sets to see any deviation of appeared effectiveness.

Considering the potential alternatives to the current quantitative metrics available, the rubric may provide the most value to gain a quantitative insight from a no-fault exercise.

Resources

The Importance of Conducting Exercises

The Importance of Conducting Exercises

Significant events over the past three decades have put pressure on the United States government to exercise response and preparedness.  Attacks both domestically and internationally have pushed legislation into domestic resilience, combating terrorism, incident command, and disaster preparedness.  In the 1980’s, the U.S. saw a rise of state-sponsored terrorism that included the following:

  1. In 1983, the United States embassy in Beirut, Lebanon was bombed, killing 63 people, mostly embassy and CIA staff members, as well as several soldiers and one Marine. 17 of the dead were Americans.

  2. In 1984, the Achille Lauro cruise ship was seized by a terrorist group known by the Palestine Liberation Front (PLF), holding 700 hostages, mostly Americans; 1 death.

  3. In 1985, TWA 847 flight hijacked; 1 death when a U.S Navy Diver was killed and tossed on to the Tarmac in Beirut, Lebanon.

  4. In December 1988, Pan Am 103 was bombed and exploded over Lockerbie, UK, killing all 270 people on board residing from 21 different countries.

In 1985 Vice President George Bush created “Vice President’s Task Force on Combating Terrorism” as the first end-to-end review focused on joint/interagency CT response assets coordination.  This review resulted in the passing of National Security Decision Directive 207 on Jan 20, 1986, where the U.S. National terrorism program first defined:  “U.S. Program on Combating Terrorism” crisis response to incidents overseas, ultimately guiding U.S. strategy for the next decade. Terrorism and other disasters continued to be the catalyst to push legislation and directives over the next 25 years, including the passing of several presidential policy directives that identified the need for better response and preparedness by first responders and follow-on response during a crisis.  Some of the legislation and policy directives specifically site the need for conducting tabletop exercises to help prepare for similar disasters.

Terrorism

In 1996, in response to the Tokyo sarin gas attacks and the Oklahoma City bombing, President Bill Clinton passed Presidential Policy Directive (PPD) 39.  Essentially, it stated it is the policy of the United States to “deter, defeat and respond vigorously to all terrorist attacks on our territory and against our citizens, or facilities, whether they occur domestically, in international waters or airspace or on foreign territory… the U.S. shall pursue vigorously efforts to deter and preempt, apprehend and prosecute, or assist other governments to prosecute, individuals who perpetrate or plan to perpetrate such attacks.”

Around that same time, many Congressional committees and subcommittees met on topics such as National Security and Combating Terrorism.  One of the programs discussed that demonstrated the expanded roll of the government in domestic preparedness is the Nunn-Lugar-Domenici Preparedness Program.  This program rose from the Defense Against Weapons of Mass Destruction Act of 1996 signed by President Clinton. This program looked to provide training for possible incidents involving terrorists using Weapons of Mass Destruction (WMD).  From this, the Department of Defense and Federal agencies were provided millions of dollars to conduct training, including tabletop exercises, in140 cities across the U.S.

Disasters and National Level Exercises

In 2005, Hurricane Katrina was considered the most destructive and costly hurricane to hit the United States, with damage estimated at $125 billion and 1,500 deaths across four states. Part of the blame lies on what appeared to be poor communication and response to this disaster, the Committee on Homeland Security and Governmental Affairs cited the failure of government at all levels to plan, prepare for, and respond aggressively to the storm.  Specifically, they stated that four overarching factors contributed to the failures.  These are:

  1. Long-term warnings went unheeded, and government officials neglected their duties to prepare for a forewarned catastrophe;

  2. Government officials took insufficient actions or made poor decisions in the days immediately before and after landfall;

  3. Systems on which officials relied on to support their response efforts failed; and

  4. Government officials at all levels failed to provide effective leadership.

The report stated that preparation plans and response were inefficient and not well devised.  Additionally, the report stated that the government had been insufficiently conducting training and exercises.  In 2005, DHS assumed full responsibility for planning, conducting, and after-action reporting of the National Exercise Program, known then as the Top Officials exercises or TOPOFF exercises.  In April 2005, DHS had implemented TOPOFF3 or the third tabletop exercise in the series which was designed to identify vulnerabilities in the Nation’s domestic incident management capability including the structure of the National Response Plan (NRP).  The NRP originated from Homeland Security Presidential Directive – 5 and was directed by President Bush to align Federal coordination structures, capabilities, and resources into a unified, all-discipline, and all-hazards approach to domestic incident management.

Exercises needed if done effectively

In response to the exercise, the Department of Homeland Security (DHS) Inspector General stated in November 2005: “the exercise highlighted – at all levels of government – a fundamental lack of understanding for the principles and protocols set forth in the NRP and the [National Incident Management System] NIMS.”  This identified confusion provoked discussion and demonstrated the importance of conducting exercises.  The absence of exercises in the NRP meant that there were no further formal opportunities to understand potential problems and to incorporate lessons learned into the NRP.

From these gaps and failures, DHS through the Federal Emergency Management Agency (FEMA) dropped the NRP in exchange for the National Response Framework (NRF). The National Response Framework (NRF) is a guide to how the Nation responds to all types of disasters and emergencies. It is built on scalable, flexible, and adaptable concepts identified in the National Incident Management System to align key roles and responsibilities across the Nation. This Framework describes specific authorities and best practices for managing incidents that range from the serious but purely local to large-scale terrorist attacks or catastrophic natural disasters. The National Response Framework describes the principles, roles and responsibilities, and coordinating structures for delivering the core capabilities required to respond to an incident and further describes how response efforts integrate with those of the other mission areas.

 The NRP was cited as being “insufficiently national in its focus… and …should speak more clearly to the roles and responsibilities of all parties involved in response.” This NRF is further covered under the signing of PPD 8 by President Obama, which in essence focused on an integrated, all-of-Nation, capabilities-based approach to all-hazard preparedness which can include, but are not limited to, the use of tabletop exercises as a form of preparation.

One thing Katrina did was reveal the impact that a lack of an effectively trained and exercised plan, as well as not practicing the interoperability of communications will further undermine the response. As part of the Committee on Homeland Security and Governmental Affairs, several recommendations were made to improve response, coordination and preparedness.  Of those, the committee recommended that “Federal departments and agencies should be required to conduct exercises to ensure that their plans are continually revised and updated,” as well as “emergency agencies at the federal, state, and local levels of government, as well as first-responder groups outside of government, should receive regular training on NRP and NIMS.” It is important to note that the NRP is considered the foundation of the NRF and it built upon, not entirely rebuilt, the national framework previously established.

Despite these failures, not all was a loss when, in 2005, DHS developed the Universal Task List (UTL) and Target Capabilities List (TCL).  The UTL helps exercise participants and planners by describing incident management tasks to be performed and provide them with a standardized reference for all levels of government and the private sector. The TCL contains capabilities that various levels of government need to develop and maintain to prevent, respond to, and recover from a terrorist attack or major disaster.  These two lists have proven useful to this day in all types and levels of exercise.

Exercise Progress

Through the lessons learned in the 1990s and the first decade of this century, exercises have been identified as a valuable tool for preparedness.  These lessons have shown the importance of addressing command and control during a crisis situation and working through scenarios such as a terrorist WMD incident or a major storm.  This progress has not always been so positive.  The final after-action report recommendations from the TOPOFF3 exercise failed to include improvement planning to address remedial needs and corrective action procedures which were not a part of the original evaluation.  It only informed participating departments and agencies of existing problems, and encouraged improvements in agency prevention, response, and recovery capabilities.  Furthermore, after-action reports, best practices, and lessons learned from the TOPOFF3 exercise have not been disseminated to a broad national audience.

Despite some of the problems stated, there is still great need for response assets to exercise their plans and prepare for disasters. For the response assets to take part in these exercises, it comes at the cost of government spending.  As Sequestration is taking effect and budgetary uncertainty lingers, training value needs to be maximized for ultimate effectiveness.  One cost-effective solution to providing an exercise that helps identify gaps or vulnerabilities in plans and preparedness are tabletop exercises. Tabletop exercises have evolved to incorporate the most important objectives and have been proven to be effective at training personnel without a large cost or resource commitment.  The difference in costs between a field exercise that requires the deployment of assets and a tabletop exercises can be as much as ten times. Despite this, field exercises are still necessary to ensure front line response and command post officials are able to handle their tasks and responsibilities in a stress added situation. So, can tabletop exercises provide the training and exercising of plans and procedures needed for response assets to prepare for a disaster?  Furthermore, the question that ultimately surfaces is how do we know this to be true from a discussion-based exercise?

Resources

Types of Exercises and Training

Types of Exercises and Training

As seen in human-caused events such as the 9/11 attacks, the Virginia Tech, Columbine, and Aurora, CO. shootings, as well as natural events such as “Snowmageddon,” the name used to describe the various blizzards impacting parts of the world in the last four years, hurricanes, and earthquakes, response assets are tasked with handling the crisis and consequence results.  To prepare for numerous scenarios, response assets take part in all forms of training that range from Discussion-Based to Operations-Based exercises which include the actual deployment of resources.  Each form of training exercise has unique goals and characteristics of conduct.  Below, table 1 describes discussion-based exercises and table 2 describes operations-based exercises.

Discussion-Based Exercises

Operations-Based Exercises

Exercises play a vital role in preparedness by enabling stakeholders to test and validate plans and capabilities, and identify both capability gaps and areas for improvement. A well-designed exercise provides a low-risk environment to test capabilities, familiarize personnel with roles and responsibilities, and foster meaningful interaction and across organizations. Exercises bring together and strengthen the responders in their efforts to prevent, protect/deter, mitigate, respond to, and recover from all hazards. Overall, exercises are cost-effective and useful tools that help the nation or an organization practice and refine our collective capacity to achieve the core capabilities preparedness goals.

Because most of the expertise utilized in the contribution to this paper originates from tabletop exercise experience, this paper will focus on tabletop exercises and pose the question of whether they are an effective form of preparing response assets. Tabletop exercises can be as much as 10 times less expensive than a full-scale exercise and may be able to address issues and plans with stakeholders that have the ability to make appropriate organizational changes.  Additionally, by solely focusing on tabletops, the successes and limitations of these exercises can be thoroughly examined and evaluated for their effectiveness.