Against a backdrop of ever-growing data storage needs and a continually shrinking window for performing backups, Mike Harwood explores some of the strategies that can be employed in backing up large amounts of data.
It is certainly not news to say that the amount of electronically stored data is growing. It's because of this increase in storage space requirements and the trend towards data centralization that our backup solutions must scale to SAN and NAS storage demands in terms of speeds, reliability, and security. Because of the importance backups play in the design and implementation of a storage strategy, they must be carefully considered.
As we know, a backup is simply a copy of electronic data which is used as a means of recovery should the data become lost, corrupted, or compromised. Though the definition of a backup may be straightforward, the actual implementation of a storage backup solution can be a difficult task that encompasses many obvious and not so obvious considerations.
In the next two storage basics articles, we'll explore some of the factors to consider when implementing a backup solution for a storage area network, starting with one consideration that dates back to the origin of the Local Area Network — dealing with the backup window.
Dealing with Shrinking Backup Windows
A backup window refers to the time it takes to complete a given backup. This backup window is determined by both the amount of data that must be backed up and by the speed of the network infrastructure that handles the data. For some organizations, the backup window doesn't present any real problems. Such organizations typically have the ability to complete data backups in the off hours without running into production time.
However, as the amount of data grows, so too does the time it takes to perform the backup, and soon backups will run into production time. Further, many organizations today do not have an off hours period — they require network access 24/7, leaving a very small or even nonexistent backup window.
There are many ways to address the backup window issue, and the one chosen will depend on the needs of the organization, budgets, and of course the amount of data that must be backed up. Some of the methods used for operating within a backup window include using differential and incremental backups, snapshots, hardware and infrastructure upgrades, and potentially modifying the network backup design using server-free and LAN-free backups.
Starting at the beginning, some of the oldest methods of dealing with the backup window are using incremental or differential backups instead of regularly performing full backups. Before designing a backup solution with these methods, it's important to first have a solid understanding of backups in general and what each of the alternatives is designed to do.
A full backup saves all directories and files, and while it might sound ideal to perform a full backup every time we back up our data, the backup window often prevents this. Because of the time and media space a full backup can take, they are often restricted to a weekly or monthly schedule, although the increasing speed and capacity of backup media is making nightly full backups a much more realistic proposition, even for those with hundreds of gigabytes of data.
Full backups, if you have the time to perform them, offer the ultimate in data protection. In effect, a single tape, or set of tapes, can provide the ability to completely resurrect a server to its current state. Full backups are not, however, without their drawbacks; one of which is security-related. Each tape contains an entire copy of the data on a given server. If the tape were to be stolen, the thief would then have an entire copy of the data.
Page 2: The Incremental Backup
The Incremental Backup
The incremental backup provides a much faster method of backing up data than a full backup. During an incremental backup only the files that have changed since the last full or incremental backup are included. Because of this, the time it takes to conduct the backup may be a fraction of the time it takes to perform a full backup. To determine whether a file has changed since the last full backup, the backup software checks a setting known as the ‘archive bit.’
When a file is changed in any way or copied from one area of the disk to another, the archive bit is set to indicate that, at the next scheduled backup, the file needs to be copied or archived. Full backups do not concern themselves with whether or not the archive bit is set before backing a file up, but they do clear the bit after the file has been copied to tape. Any files that then change have the archive bit set, indicating that they need to be backed up again.
Unlike differential, which does not clear the archive bit after copying a file, incremental backups clear the bit so that unless the file changes again, it does not get backed up unnecessarily. The use of the archive bit also allows you to visually see which files do need to be backed up.
The convenience of quicker backup times comes with a price — in this case, the restore time. When restoring from an incremental backup, you need the most recent full backup as well as every incremental backup since the last full backup. For example, if you were do a full backup Friday and incrementals on Monday, Tuesday, and Wednesday, and the server crashes Thursday morning, you would need four tapes — Friday's full backup and the incremental backups for Monday, Tuesday, and Wednesday.
The Differential Backup
Differential and incremental backups often get confused, but there's a clear distinction between the two. Whereas incremental backs up all the modified files since the last full or incremental backup, differential backups offer a middle ground by backing up all the files that have changed since the last full backup. Restoring differential backups is a faster process as only two tapes are needed — the last full backup and the latest differential.
Differential backups work well in environments that have a reasonably large window to conduct backups and that have the capacity to do so. In the case of differential backups, they work by looking for files that have the archive bit set, and then back up only those files.
As stated above, because differential backups copy any data that has changed since the last full backup, which would have cleared the archive bit, it does not change the state of the bit. The upside of this approach is that only two tapes are needed to effect a complete restore. The downside is that at each differential backup, there's a high probability that some data (that which has changed since the last full backup but not since the last differential) will be backed up more than once.
Synthetic Full Backup
One final backup method worth mentioning is the Synthetic Full Backup. Synthetic full backups are used when the backup window is too small for the other options. In a synthetic full backup, information is taken from a full backup and the differential or incremental to create a new full backup tape. This allows a full backup to be created offline, allowing the network to continue to function without any performance degradation or disruption to network users.
Page 3: File System Snapshots
File System Snapshots
As data continues to grow and our backup windows continue to shrink, new technologies are needed to augment what we can do with the previously mentioned backup methods. In the latest UNIX versions, we have a strategy called file system snapshots.
A file system snapshot is a frozen image or picture of a file system at a given instant of time. Snapshots allow for many important features, including the abilities to provide:
- Backups of the file system at several times during the day without needing large amounts of additional storage media
- A way to perform file system integrity checks on a running and changing file system in an effort to reclaim lost blocks
- Perhaps most importantly, reliable off media backups without the need for long backup windows.
File system snapshots offer system administrators the freedom to create reliable backups of their systems without needing to shut down running applications for fear of data on disk changing while the backup is happening. Some vendor implementations of snapshots have the ability to mount this Point In Time Recovery (PITR) image as a read-only file system that you can then easily recover individual files from. According to Darcy Buskermolen, a network administrator from Wavefire Technologies:
“The frozen point in time image is extremely useful when it comes to being able to back up databases, allowing you to make a perfect backup of the
database without needing any sort of maintenance window, thereby increasing your product/application accessibility uptime, as well as providing you with
the reassurance that you are better protected from both system failure and user mishaps.
“I recently used snapshots to recover from a botched system upgrade that without them would have resulted in 100s of hours of customer downtime, a sleepless night spent restoring files from the full and incremental backups, and the usual barrage of phone calls complaining of downtime,” continues Buskermolen.
Volume Shadow Copy Technology in Windows Server 2003
Windows Server 2003 has introduced its own snapshot technology known as Volume Shadow Copy Technology (VSCT). The basic function of VSCT is threefold: applications can continue to write data to the volume during a backup; files that are open are no longer omitted during a backup; and backups can be performed at any time, without locking out users or having to worry about the backup window.
VSCT allows you to create shadow copy backups of volumes — exact point-in-time copies of files, including all open files. For example, databases that are continually held open and files that are open due to operator or system activity are backed up during a volume shadow copy backup. In this way, files that have changed during the backup window are copied correctly.
In this article, we have explored some of the strategies that can be employed to manage the process of backing up large amounts of data. In the second part of the storage basics series on backups, we'll move beyond these general methods of managing the backup window to more specific strategies, including using server-free and LAN-free backups, and we'll take a look at designing the network infrastructure to handle backup requirements. In addition, we will provide an overview of backup security in a SAN environment and revisit the debate of tape versus disk backups.
See All Articles by Columnist Mike Harwood