Henry Newman discusses replication and transport options for disk-based data recovery.
Preparing for a disaster is more often than not part of the storage planning process, and without question, it is one of the most difficult tasks, since it includes local hardware and software, networking equipment, and a test plan to ensure that you can recover from the disaster.
There are four questions you should ask to determine your disaster recovery requirements. They are:
- How soon after a disaster do I need to be back up?
- How much data has to move back and forth daily?
- How far away is the disaster recovery site?
- What am I doing with my data now (HSM/backup)?
The answers will help determine your technology choices and price. You also need to look at your current host-based systems, software, and RAID hardware. Disaster recovery is not cheap: you need hardware, software, and trained personnel to manage the systems and process.
Replication of data is the biggest issue. You have two types of data (disk-based and HSM-based tape issues), and each has its own complications. In this article, we will cover disk-based disaster recovery, and next time review the issues surrounding replication of HSM data over long distances.
Anyone trying to develop a disaster recover facility must replicate data. Up until the last ten years or so, data replication was often done with sneaker net (for those of you not old enough to remember, that's running the data to the other site, usually on tape). Since that time, networks have gotten faster and cheaper, and software has been developed to take advantage of these networks, both on the RAID and host side.
Data replication has two major components. The first is the method of replicating the data at the other site, usually either host-based replication or storage-based mirroring technology. The other issue is how to transport the data to the other site, either via WAN and IP or dark fiber and a Fibre Channel connection.
The technologies and products associated with each should be carefully considered, since changing disaster recovery plans and methods is not easy and certainly not cheap.
Page 2: Replication Options
Continued From Page 1
The two most common methods of data replication are host-based mirroring and RAID-based mirroring.
Each method has its advantages and disadvantages. One of the big issues with any type of replication is whether it is synchronous or asynchronous and what is the latency. I often quote John Mashey, a famous computer architect, who once said, "Money can buy you bandwidth, but latency is forever."
A number of software products allow you to mirror your data from the host. This method is often easy to configure and manage, but the downside is that this uses host resources such as CPU, memory, and I/O bandwidth. Another drawback is that you must have the software on each host that is to be mirrored. One of the areas that concerns me about this method is latency with the mirror.
There are two types of mirroring, just as there are two types of I/O from applications:
- Synchronous — Control is not given back to the application until the data is either on the RAID controller or on the disk, so until a SCSI acknowledgement is received, you cannot issue the next I/O request.
- Asynchronous — Control is returned to the application as soon as the I/O is issued to the operating system.
Applications, and in some cases file systems, can also control the type of I/O issued; see Operating System Calls and I/O
and File Systems and Volume Managers: Understanding the Internals
Even if you turn on asynchronous mirroring, if you are doing a great deal of I/O, you need to ensure that you have the bandwidth to the mirror, and remember that bandwidth is only part of the issue. If you are doing synchronous mirroring, you must calculate the expected latency and predetermine the expected slowdown of your application.
Even if you are doing synchronous mirroring, there are two potential types of I/O:
- The host receives a reply back when the I/O gets to the other RAID.
- The host receives a reply back when the I/O gets to the disk itself.
The second method is far more conservative, but the potential for data to be lost is greater. I often find that host-based mirroring is used when lower-end RAID controllers are used, as most high-end RAID controller vendors provide a replication method, which comes at a cost for both support for the initial software and hardware for the remote replication.
The key to success with host-based mirroring is to understand how much data you are going to write from that host and over what period of time the data will be written. Taking statistics over a 24 hour period is a good way to understand aggregate bandwidth requirements. The concern is what happens if most of that data get written when people get to work, just before lunch, and when they leave for home — you are going to have some major slowdowns and performance issues for your applications, because you cannot meet the service objectives. On the other hand, if the data is a constant load throughout the day, then aggregate numbers are reasonable for the analysis.
Applications from the major RAID hardware vendors (EMC, HDS, IBM, LSI and others) allow you to replicate data at an off-site disaster location. This location could be 10 KM away or 10,000 KM. Of course, you also have to buy the network hardware and bandwidth, which presents some challenges. The two biggest options are dark fiber and Fibre Channel or a TCP/IP-based solution over SONET.
Whatever your choice, the issues with latency and John Mashey's words still apply. All RAID-based mirroring methods that I am aware of have synchronous and asynchronous options that allow the administrator to determine which type of I/O will be used.
In addition to those options, you generally have tunable options for cache usage, such as:
- How much cache is to be used for the mirror?
- How long is the data kept in cache before it is written to disk, in case you have a write and then an immediate rewrite?
- How is the write to the other cache acknowledged?
These tunable parameters can have a large affect on the overall performance of the system and the efficient usage of the network.
Page 3: Transport Options
Continued From Page 2
Once you've decided how you want to replicate data, you must decide how to transport the data to the other site, either via WAN and IP or dark fiber and a Fibre Channel connection.
Most dark fiber solutions use a Fibre Channel connection from the back end of the RAID controller to a Fibre Channel switch on the host side, to another Fibre Channel switch on the mirror side and then to the RAID. The reason this is done is that Fibre Channel buffer credits are required to ensure that the channel is operating efficiently and that switches have more buffer credits. Each buffer credit is basically a command in progress. (See Resolving Finger-Pointing in Storage.)
Given the latency for acknowledgements at the Fibre Channel layer, to ensure that the channel is filled with commands, you need about 120 buffer credits for 50 KM with 2Gb Fibre Channel. These are rules of thumb, and the real numbers depend on the measured latency in your Fibre Channel network. Since most Fibre Channel switch vendors have at most 256 buffer credits per port on their switches, this is a real problem if you want or need to mirror long distances. Some new vendors such as Celion have placed thousands of buffer credits on ports to allow high performance transmissions at huge distances (greater than 3,000 KM).
So dark fiber is an option, but you must ensure that your hardware matches the distance of your architecture. The biggest advantage of dark fiber and SCSI over Fibre Channel is the lack of protocol translation. Since you are effectively running the native protocol that the RAID is running, the latency and overhead for translation is not a concern, as it can be with a TCP/IP-based solution.
TCP/IP has a similar hardware solution. From the back end of the RAID, you use a channel and put that channel into a hardware Fibre Channel to IP converter. This is then connected to a WAN connection, then back to IP to Fibre Channel on the other end.
One of the big advantages here is that most of these converters can implement data compression. The Fibre Channel method cannot. Also, many large WAN routers support encryption, which is required in many environments.
The tradeoffs are pretty straightforward:
||Depends on data compression, but higher protocol translation overhead
||If you have enough buffer credits, 2 Gb
||Yes, but much of this is handled by TCP/IP
||Yes, generally distances over 100 KM will require special hardware, since switches do not have enough buffer credits
Moore's Law Meets Network Limits
Replication of data is becoming the rule instead of the exception. Two factors complicate this and will likely just get more difficult:
- Network performance is not increasing at the same rate as CPU performance, and these CPUs are generating more data.
- Network performance is not even increasing at the rate of disk density gains.
Everyone is talking about 10Gb connections, but I have yet to see a 10Gb connection from a host to a network. They are talking about network router-to-router communications. Given Moore's Law, we are going to be generating more and more data to larger and larger disk drives, but at 2Gb Fibre Channel.
I believe the company formally known as JNI made its 2Gb FC card available in September 2000. Since that time we have gone from 36GB FC drives to an announcement from HDS for 300GB FC drives. As you can see, the numbers are out of whack, and they are not getting any better.
What does all of this mean? Without careful planning, a good understanding of the requirements, and the right hardware and software, you can expect problems, if not outright failure. Any project of this magnitude requires an overall architect who has end-to-end responsibility; you cannot architect remote mirroring in pieces and expect it to work. Next time we will cover replication for HSM-based solutions.
See all articles by Henry Newman