Case Study: Improving Disaster Recovery Without Breaking the Bank
When FleetBoston Financial evaluated its disaster recovery responsiveness, the company received a shock.
IT administrators there realized it would take at least two full days to recover their data and systems from a collapse during a serious disaster. But after applying EMC Symmetrix in conjunction with SunGard electronic vaulting services, FleetBoston's disaster recovery window shrank from 48 hours to less than an hour.
"We have achieved a recovery window of less than one hour on critical systems and four to eight hours for a complete recovery," says Lari Sue Taylor, senior vice president of technology at FirstBoston Financial.
FleetBoston Financial is a financial services company with assets of $196 billion and over 18 million individual, corporate, and institutional customers.
The company primarily focuses on small business and commercial banking in the Northeast U.S. market, with products and services are available through a variety of channels, including 1,460 stores and over 3,400 ATMs from Maine to Pennsylvania, as well as HomeLink online banking and telephone banking. The bank is also currently in the process of merging with Bank of America.
A few years ago, when its ATM network had expanded significantly and online banking started to take off, management realized that disaster recovery needed a complete rethink. They used two key metrics in evaluating disaster recovery, or business continuity, planning:
Recovery Time Objective (RTO) – the maximum length of time that a business process can be unavailable. This is measured in terms of time elapsed from the beginning of a disaster until the systems are operating again.
Recovery Point Objective (RPO) – how much work in progress can be lost. If all work must be recovered, then the business must align its disaster recovery actions to achieving zero RPO. Some businesses, however, may elect to have an RPO of one day, for example, on the understanding that if they lost one day's transactions, they could recreate them by interviewing sales staff, etc.
FleetBoston chose what at that time was regarded as an aggressive RTO of 24 hours and an RPO of zero.
"We would suffer significant business impact from transaction loss," says Taylor, "so we had no choice but to opt for the zero RPO."
In support of this, she cites a Gartner Group study which revealed that 93 percent of companies that experience a major data loss go out of business within five years.
Page 2: Improving Recovery Time
Continued from Page 1
Improving Recovery Time
FleetBoston was already utilizing EMC Symmetrix and tape libraries for the purposes of daily and weekly backups. During testing of its disaster recovery responsiveness, however, the company discovered it would take 24 hours alone to restore data from tape to disk. And it was only possible to meet the 24-hour time frame if everything went smoothly.
As a result, the company searched for a better approach to disaster recovery.
Administrators selected Electronic Vaulting Services by SunGard, in conjunction with an EMC product called Symmetrix Data Remote Facility (SDRF), a combination storage hardware and application that lets users copy data to a remote, secure location without requiring any IT downtime. In the event that backup data needs to be retrieved, SDRF can recover hundreds of terabytes of information within hours, according to EMC.
Prior to purchase, the FleetBoston auditors voiced concern about the solution being too bleeding edge an approach that lacked proven results in the real world. The company, therefore, interviewed early adopters to ascertain any problems they might run into. This brought several issues to the surface, with distance limitation and channel extension being the top concerns.
A multiplexer channel provides the physical connection that allows input and output devices to communicate with the computer. The multiplexer channel typically requires devices or their control units to be within 200 to 400 feet of the mainframe computer. Channel extension technology makes it possible to extend the multiplexer channel of the computer to anywhere in the world regardless of distance.
"Our primary data center was 120 miles away from our remote recovery center, so channel extension was necessary," says Taylor.
She evaluated channel extension products from Computerm and InRange before choosing Computerm Adaptive Copy. The use of Computerm and Symmetrix, though, meant that the company would have to use an asynchronous mode of data transfer between one site and another. As a result, there would be a delay of a few seconds between transactions being processed in the main data center and those same transactions being transferred to the remote disaster recovery site.
The combined EMC/SunGard/Computerm architecture adopted by FleetBoston was successfully implemented.
From 48 hours or more, the RTO came down to less than one hour for critical systems. During one major emergency when all systems were down at one data center, the remote site took over seamlessly. According to Taylor, this one event paid for the technology immediately since it prevented large-scale revenue loss.
While FleetBoston administrators are happy with their current disaster recovery functionality, it is still evolving. One major issue is whether the company should continue to replicate all mainframe data, or if it can apply Information Lifecycle Management (ILM) techniques to minimize the amount of data that has to be transmitted during backups and during system recovery. This could also free up bandwidth for more productive uses.
The company is also looking to increase its current rate of mirroring. Fleet now mirrors all of its data every two hours, but Taylor is investigating ways to shorten the length of time between mirrors without significantly increasing costs.
Feature adapted from Datamation.
See All Articles by Drew Robb