Information Security News mailing list archives

Twisters, hurricanes, floods (oh my)


From: InfoSec News <isn () c4i org>
Date: Thu, 4 Sep 2003 00:35:29 -0500 (CDT)

Forwarded from: William Knowles <wk () c4i org>

http://www.computerworld.com/securitytopics/security/recovery/story/0,10801,84579,00.html

Story by Matt Villano
SEPTEMBER 03, 2003 
CIO.com
 
The evening of Sunday, May 4, 2003, at Aeneas Internet and Telephone 
began as any previous Sunday evening had. The Jackson, Tenn.-based 
company that serves about 10,000 Internet and 2,500 telephone 
customers was closed for the weekend, awaiting the return of its 17 
employees the next morning. Just before midnight, however, all hell 
broke loose. An F-4 category twister touched down just outside of 
town, then tore through Jackson's downtown area, leveling houses, 
historical sites and municipal buildings alike. The tornado ripped 
straight through Aeneas's one-story building, leaving only a pile of 
rubble. 

Meanwhile, Aeneas CIO and Operations Manager Josh Hart, who'd heard 
about multiple tornadoes in the area that day, was home, 52 miles away 
in Martin, Tenn., huddling in his bathroom with his family. As soon as 
he was able, he flipped on the TV for news footage of the devastation. 
What he saw looked like "a war zone," bricks and concrete everywhere 
and piles upon piles of rubble. 

At 2 a.m., with those images in the background, Hart's cell phone 
rang--it was Aeneas Network Administrator Jason Warren calling from 
what he likened to Ground Zero to report that everything in Jackson 
was lost. Another call came in from CEO Jonathan Harlan. 

"I'm listening to [Warren] tell me what it's like, and he says, 'It 
doesn't even look like there was an office here,'" remembers Hart, 25. 
"The tornado destroyed our computers, our desks, everything. I 
couldn't believe what he was telling me." 

Aeneas lost nearly $1 million in hardware and software that night, and 
an estimated 72 hours of downtime. But just as Aeneas in Virgil's 
Aeneid endured the worst the gods had to offer, so too did this 
Aeneas. This one, however, was wise enough to have created a 
contingency plan--one that minimized the damage and kept the company 
afloat during its darkest hour. 

The company is not alone. After a nationwide scramble to prepare for 
high-impact, low-probability events similar to the attacks of Sept. 
11, CIOs have since realized that their organizations are far more 
likely to succumb to another type of event--one that has a high 
probability of occurring and, curiously enough, is probably simpler to 
predict: the weather. For example, in June, while the Atlantic 
seaboard was bracing for the start of hurricane season, Arizona was 
busy battling forest fires. And in Harris County, Texas, in 2001, a 
tropical storm and resulting flood taught one IT executive the 
importance of flexibility. 

Both Aeneas's Hart and Steven W. Jennings, Harris County's executive 
director of central technology, share their experiences here in an 
effort to provide best practices and battle-tested secrets about which 
preparations work best. According to Carol Kelly, vice president of 
government strategies for Meta Group, these are lessons from which 
everyone can learn. "When disaster strikes, you want to be ready with 
a plan of action and an approach of how to deal," she says. "You might 
be ready for the next terrorist attack, but if you're not ready for 
the next nor'easter, your plans won't amount to much." 


Big plans for a small company 

Aeneas launched its contingency plan when it was founded in 1996; 
since then, CIO Hart has enhanced the strategy gradually almost every 
year. In early 2002, as the ISP neared 10,000 Internet customers, he 
and his network administrator, Warren, thought up the company's most 
comprehensive approach yet. While they determined that the likelihood 
of a terrorist attack on the western Tennessee town of Jackson, 
population 59,600, was slim to none, they concluded that because of 
the municipality's location in the central U.S.'s infamous Tornado 
Alley, the plan should respond to the next most likely cause of 
disaster--twisters. What ensued was a three-pronged plan that hinged 
upon colocation, distribution and backups. 

* First, by employing Border Gateway Protocol (BGP) programming on a 
  high-class circuit shared with an ISP 90 miles down Interstate 40 in 
  Memphis, Aeneas would colocate in real-time its IP addresses and 
  reroute data traffic offsite during any local disruption. With this 
  system, servers would automatically reroute Internet service 
  operations the moment a disruption occurred. In theory, at least, 
  that would guarantee continuity of operations across the board. 

* Next, the company distributed its voice traffic dynamically, paving 
  the way to switch its T1 connections from one fiber node in the Bell 
  South network to another, in the event of a sudden telecommunications 
  infrastructure failure. This system was designed to preserve 
  continuity much like the BGP system. 
 
* Finally, the company's network administration team engineered 
  applications that stored customer records and other data on tape as 
  well as on backup hard drives. Though the tape and hard drives were 
  stored onsite at the Jackson location, Hart and Warren figured 
  onsite backup was better than none. 

This strategy wasn't put to the test until tornado season this year, 
when hardware, software and pieces of the local infrastructure were 
destroyed May 4. Business customers on T1 lines lost their connections 
as soon as the tornado struck. ISP traffic also went down immediately 
and took 36 hours to restore. The fiber node switch to recover voice 
traffic took a bit more time, as Aeneas programmers worked around the 
clock with technicians from Bell South to migrate the T1 connections 
from the old node to the new, finalizing the switch nearly three days 
after the twister hit. 

"When you have hundreds of T1 lines that need to be moved from one 
node to the next, there's a lot of reengineering that needs to take 
place," says Hart. "We thought we were prepared, but I'm not sure we 
ever considered just how difficult this would be." 


Bumps in the disaster recovery road 

Beyond the challenges inherent in rerouting traffic, the remediation 
effort hit two other snags. The first revolved around colocation; 
because the colocation arrangement with the Memphis ISP was still 
being set up at the time of the tornado, the Memphis site didn't yet 
have sufficient servers. To remedy the situation, Aeneas staff 
members--and family and friends--drove to Memphis with additional 
equipment to handle the load. The company had some of this equipment 
on hand--what it didn't have, Hart and Warren purchased online and had 
overnighted to their homes. All told, colocation was down for about a 
day and a half. 

The larger and more formidable of the two setbacks involved the 
company's tape and hard-drive backups. It was clear from the beginning 
that most of the company's paper-based customer records had fallen 
victim to Mother Nature, but four days after the tornado, Hart and 
Warren discovered that the electronic tape and hard-drive backups had 
failed as well. Hart finally uncovered the tape and hard drives May 8. 
When he pulled the tape from the rubble, it was so badly damaged that 
he hardly recognized it. Hart passed the hard drives on to a number of 
local data recovery specialists to see if they could retrieve 
anything. One by one, each came up empty. 

Finally, as a last resort, Hart plucked the hard drives from four 
different nonfunctioning computers and turned them over to Kroll 
OnTrack, a data recovery company in Minneapolis. Miraculously, the 
vendor discovered a recent copy of the customer records database on 
all four computers and was able to recover all of the customer data 
and return it to Aeneas, delaying printing of its May bills only 
minimally. 


Large organization, even larger plans 

For an IT organization as small as Aeneas, the tornado presented 
sizable challenges. But for the IT organization of Harris County, 
Texas, which services more than 15,000 county employees and nearly 3.5 
million constituents, the problems presented by Tropical Storm Allison 
were downright monumental. 

Disaster struck June 6, 2001--the second day of a five-day storm--when 
atmospheric conditions caused a cloud to linger over the Houston area 
for nearly six hours, dropping more than 39 inches of rain. By the 
time the clouds parted, Harris County government had lost five 
buildings and most of the communications and other hardware and 
software in them to water damage. The price tag: a whopping $24 
million. 

Fortunately, though, Executive Director of Central Technology Steve 
Jennings had prepared for such an event. When Jennings joined county 
government in 1975, he established continuity planning to address 
natural disasters, such as flooding and hurricanes. The plan, which he 
dubbed the Four R strategy, hinges on four incremental steps--review, 
rewire, relocate and rebuild. 

With this in mind, Jennings attacked the recovery immediately, 
following his plan like a bible. The morning after the deluge, he and 
his top advisers met to review assets and assess damages. Next, 
because Harris County is public and qualifies for federal aid, 
Jennings called in the Federal Emergency Management Agency (FEMA) to 
inspect the damage and lend him some disaster recovery expertise. He 
also brought in NetVersant Solutions to lay new fiber-optic cables. 
This process took approximately six weeks. In the meantime, Jennings 
reconvened his advisers, and put together an emergency relocation plan 
to disperse county employees into available office space on high, dry 
ground. Three months later, he tapped into the first of several 
batches of funding from FEMA to start rebuilding, spending millions on 
treating buildings for water damage. 

Jennings also worked double time to ensure that county communications 
didn't miss a beat. "We utilized existing remote access facilities 
that allowed county employees to dial in from home until their new 
offices were finished," he says. This was done for employees whose 
jobs were deemed critical to county operations and for those for whom 
the county couldn't find alternative space. Jennings then mobilized a 
force of technicians to install high-speed connections at the homes of 
those employees who needed it most. 

Finally, with the help of the county clerk's office, Jennings 
activated a cache of 300 Cingular cell phones, which had been reserved 
to help the blind vote on Election Day, and distributed them on an 
as-needed basis to county departments. "Those phones are deactivated 
for 11 months of the year, but they were available and we needed 
them," he says, noting that network administrators deactivated the 
phones and retrieved them once they managed to bring each department 
back online. "Part of recovering from a disaster is making use of 
everything you can find, and we did just that." When all was said and 
done, it took the county about a year to return to normal, which, 
according to Jennings, was pretty good given the scope of the damage. 


Lessons learned 

Jennings says the storm confirmed his belief that continuity plans 
should be flexible and horizontally applicable. Before the flood, 
Harris County's disaster recovery plan was conceived to respond to 
potentially any disaster, but it typically addressed single events 
such as the loss of a building, a network or a system. It was flexible 
enough, however, that it worked even when the county was faced with 
recovering multiple facilities. He adds that Harris County government 
"uses different portions of the plan for total recovery." Today, the 
Harris County continuity plan incorporates suggestions from employees 
who were part of the recovery process and lists scenarios for various 
"disaster combinations" that could occur during the next big 
storm--such as what to do if both the jail and family court gets hit. 
When that storm does happen, Jennings says he'll respond even faster 
than he did in 2001. 

The next time a weather event occurs, Jennings says he'll also have 
the added benefit of wireless. After the flooding, as Jennings tried 
to rewire the Harris County jail, he spent $200,000 on Lynx 
high-definition wireless technology as an interim solution. The 
technology worked so well that he kept it and now has it on hand to 
pinch-hit during the next crisis. If, for example, a storm knocks out 
phone lines in the southeast corner of the county, Harris can set up 
wireless in hours. In addition, if another rainstorm waterlogs some of 
the underground fiber optics downtown, Harris can use the technology 
to provide emergency telephone service to anyone who needs it. 

"Mother Nature never follows a script, especially not the one you 
wrote," Jennings quips. "As we have more experience recovering from 
the disasters she wields, we'll have a better sense of which remedies 
work best." 

At Aeneas, Hart notes that from "now until the end of time," he'll 
keep an electronic records backup offsite to eliminate the problems he 
endured in recovering those mission-critical customer files. Planning 
for offsite backup had begun before the May tornado, and the site is 
now up and running in Memphis. Hart admits that his error in planning 
nearly cost Aeneas everything, adding that he'll never make that 
mistake again. Another misstep Hart says he'd correct is the way he 
handled the media in the days following the tornado. If he could do it 
all over again, Hart says, he would have been on the phone immediately 
with newspapers, TV stations and radio outlets to jump-start the 
company's PR campaign and assuage customer concerns. 

"[Our customers] must have been watching the TV news thinking, 'Man, 
that's my ISP,' and we're too busy working on restoring systems to 
think about putting their minds at ease," he says. "Restoring 
technology after a disaster is important. But rebuilding customer 
confidence...it doesn't get more important than that." 

Matt Villano is a freelance writer based in Moss Beach, Calif. 



*==============================================================*
"Communications without intelligence is noise;  Intelligence
without communications is irrelevant." Gen Alfred. M. Gray, USMC
----------------------------------------------------------------
C4I.org - Computer Security, & Intelligence - http://www.c4i.org
================================================================
Help C4I.org with a donation: http://www.c4i.org/contribute.html
*==============================================================*




-
ISN is currently hosted by Attrition.org

To unsubscribe email majordomo () attrition org with 'unsubscribe isn'
in the BODY of the mail.


Current thread: