620 Alden Rd
Suite 104 Markham, ON  L3R 9R7
Phone: 905-513-8866
Fax: 905-470-6019

Toll Free: 1-800-263-1794


Disaster Recovery and the Toronto G20 Meeting

The upcoming G20 event in downtown Toronto provides a unique insight into the disaster preparations of many large Canadian companies.  The organizations that have their head offices within the security perimeter will have serious problems getting staff into work, not only during the weekend event, but for the preceding week as well. 

What an excellent opportunity to finally use those Disaster Recovery and Business Resumption Plans (DRP, BRP).  Or, in some cases, what a terrible time to discover that these plans are entirely inadequate. 

I'd like to talk about three critical DRP/BCP problems that I have seen more often than I’d like to mention, and in far too many large and important companies.

1. Not separating the BRP from the DRP

The first problem is obvious for the current G20 situation.  The buildings will be safe within the security corridor, surrounded by more police than they've ever seen.  The circuits will be up.  But the desks will be empty because very few people will be allowed past (or want to waste their time trying to get past) the security perimeter. 

This is similar to the often-discussed epidemic type disaster scenarios.  So it is completely unnecessary to move systems and data to a backup site.  What these companies need is simply a place to move their people to, and a way for those people to access the data.

The simplest popular solution to this problem is remote VPN access from home.  This solution requires a VPN solution that has enough capacity, and an Internet link with sufficient bandwidth to support large numbers of simultaneous and persistent connections.  It also only works for some types of occupations that happen to make extensive use of computers and electronic data.   Anybody who needs to access paper files, or physical equipment of any kind will be unable to take advantage of this type of BRP.  It also doesn't work well for extremely heavy information users such as stock traders, or for occupations where people need to confer frequently with their co-workers.

So for some types of workers and for some types of organizations it is absolutely necessary to have a remote physical location with desks and equipment.  Once again, though, the G20 meeting highlights some very serious problems with how many organizations do this.  If a company and all of its competitors are affected by the same disaster (G20, epidemic, whatever) and need to evacuate to an external site, it doesn't help if everybody uses the same external site.  This is often the case in Canada where a small number of organizations like IBM and Sungard provide business resumption facilities, but don't actually have enough space to simultaneously accommodate all of their BRP customers.

2. Too much reliance on slow tape technology to restore too many systems

This is a simple and obvious problem, but it is absolutely endemic.  Suppose your Disaster Recovery Plan involves restoring all of your mission critical systems at a remote site using recent tape backup data.  Tape backup systems do a great job of extracting the data from production systems because they tend to do a lot of incremental backups, just copying the changes to the data.  But this makes restoration extremely onerous. 

First you have to restore the base operating system.  Then you have to restore the last full backup.  Then you have to restore all of the incremental backups in order, each of which is probably recorded on a different tape.  So, even though you can back up all of your systems in a single night, you can't restore them that fast.  It will probably take several hours per system to get everything back up and running.  If there are hundreds of systems, that could mean that your DR plan will take weeks to complete.

If there is a lot of data or a lot of systems in the data centre, it is almost always necessary to go to some sort of hot-hot or hot-warm data replication system, or the DRP simply won't work in a real disaster. 

Many companies leverage this investment by using the DR site as the live site for some applications.  If a disaster takes one of your data centres off line, your net recovery time is reduced because you only have to recover some of your systems.  Then the remaining systems can be used for lower priority functions such as QA or development.

3. All-Or-Nothing DR Plans

Many companies like to simplify their DR plan by assuming a single disaster scenario that could encompass any lesser disaster.  So they assume that a disaster means that their primary data centre is a smoking crater and they have to restore everything to the backup site.

There are three problems with this type of DR plan.  First, how many cases can you remember where a data centre actually was completely destroyed?  It doesn't happen very often.  So a lot of resources are dedicated to a scenario that will probably never happen. 

Second, even if the data centre was completely destroyed, most of your customers will grant you a lot of good will and extra time in recovering your business.
And third, the types of disasters that actually do happen and happen a lot are where a single application or a single environment crashes and becomes unavailable.  In this situation it is probably not worth the business impact of shutting down all of the other applications to move everything to the DR site, and furthermore, your customers will probably not be particularly understanding about an isolated technology disaster like this.

So all-or-nothing DR is actually a bad idea.  It makes much more sense to have a flexible set of DR plans that can independently and quickly recover any application or suite of applications.

Companies need to build their DR and BR plans to be used, not as mere checkboxes in an audit report.  Big disasters don't happen very often, and it's too tempting to gamble that they will never happen.  Little disasters (like the G20 or application crashes) happen all the time and without warning, often causing serious business impacts.  Flexible DR and BR plans can minimize these business impacts.