Back to the Future with Tape Drives
Over the last 30 years tape technology has not changed as much compared with disk-based storage, but in most environments (especially enterprise environments) tape is still a requirement for reliability. Even if you have an off-site remote mirror, in most cases tapes are still used. Given tape's continued popularity, we'll cover many of the issues surrounding tape hardware.
It may seem a little quaint to discuss tape as we enter 2003. Tape technology has not kept pace with disk technology, and in fact significant improvement in tape performance has only recently been available with the release of the StorageTek T9940B drive. But there still is much to learn from an evaluation of tape technologies, which we'll cover in this article.
Most enterprise environments use tapes that write in a linear format. However, another tape type exists that generally has a higher density: helical tape.
Helical tape has more head contact with head on the tape drive. With linear tape the data is written lengthwise down the drive. With helical tape the data is written horizontally across the tape, hence the reason there is more contact with the heads.
Here are some general comparisons between these two tape types:
- A very small defect on a helical tape can corrupt the data if the error correction buffer is full. Error correction space is often left on the tape and if that space fills up the tape becomes unreadable.
- Helical tape heads wear out long before linear tape heads given that the tape heads make more intimate contact with the tape.
- Reliability is generally higher for linear tapes over helical for both the media and head life of the drive because more contact means more wear.
- Because of media wear high-end linear tapes generally have a longer storage life than high-end helical tapes.
Linear tape vendors/types include IBM 3590B/E, STK 9840/9940, Quantum SuperDLT, older DLT 7000/8000, and LTO.
Helical tape vendors include Sony, which makes AIT-1 and AIT-2 as well as the DTF line of tapes. Other helical types include 8mm Mammoth and Mammoth-2 4mm(DAT).
Unlike disks and RAID, almost all tapes automatically compress the data input stream. This is an important consideration when determining drive types, as different drives have different compression algorithms. Enterprise tape drives from IBM and StorageTek have higher compression rates than lower-end drives like DLT and Mammoth. Drive vendors often provide estimated compression rates but these are averages and your mileage may vary. Compression is important given the cost of the media as a function of the drive cost. Take the following example:
Drive Cost: $35,000
Media Cost: $75
Compression: 5 to 1
Drive Size: 250 GB
Drive Cost: $5,000
Media Cost: $75
Compression: 2 to 1
Drive Size: 250 GB
Let's say you have 400TB of raw data that will need to be backed up, so you're shopping for a new tape system. Drive One will require 327 pieces of media at a cost of $24,525, for a total system cost of $59,525. Drive Two will require 820 pieces of media at a cost of $61,500, for a total system cost of $66,500.
Clearly compression must be a consideration in the total cost of ownership of for tape systems -- but your mileage for compression on each drive type with your data will vary. One quick way of looking to see if your data is compressible is to use the gzip program with the -9 option:
# gzip -9 filename
You will have to test each of the tape drive that are under consideration with a statistically significant sample of your data to determine how your data behaves with the drive.
Understanding your application(s) environment is critical to developing a good architecture. Tape drives and the associated libraries have different characteristics for tape load, tape ready, position and rewind time. In some cases this is not important in applications, such as backup, where in most cases all you are doing is loading the tape and writing large amounts of data sequentially.
On the other hand, with hierarchical storage management, applications tape load, position and rewind time becomes a critical issue, especially for reading data back. HSM applications are become more popular given the length of time required for backup with increasing storage densities. In fact, StorageTek developed the T9840A and B drives specifically for HSM application with small files, as it has a 4 second load time and average 8 seconds to first data byte. Typical other products are 6 to 15 times the T9840A and B time to first byte.
But if the files are large, load and position time become insignificant compared with transfer time. If you have a 20-gigabyte file and with compression the transfer rate is 30 megabytes/sec, the transfer time is 682 seconds. With a 50-second load and position time this is only about 7.5 percent of the total time. A good rule of thumb when setting up a system is to try and keep load and position to 10 percent of the time to write the data.
For HSM applications, of course, reading is a different story, as most applications can consolidate the files ensure large amounts of data is written. Reading for HSM applications becomes an issue for using tape for HSM at it requires an understanding of the recall rate of the files, the size of the files recalled, and -- most important -- the speed requirement for recall. A credit-card company that stores information to provide approval codes is far different than a research site doing genetic research recalling a gene for comparison.
Given all this: is tape dead? A number of the large storage vendors pronounced tape dead three years ago, two years ago, last year, and likely next year. Tape has some significant advantages over disk storage, so it will be some time before tape is dead. Here are some reasons why:
- Tape does not require power. Most modern disk drives required power to be powered on for reliability; the Seagate 120GB ATA drive, for example, uses 13 watts. That can get really expensive if you have 400 terabytes of secondary storage.
- Lower error rates. Bit error rates for ATA drives (FC and SCSI drives are an order of magnitude better) are 10 to the 14th while bit error rates for enterprise tape are 10 to the 18th, and other tapes (AIT and DLT) are around 10 to 17th. Tapes are between two and four orders of magnitude more reliable than both ATA and SCSI disk drives.
- Tapes can handle higher shock than disks and still survive. We all have either personally dropped a tape or surely have seen someone drop a tape.
What to Do
For at least the next few years tapes and tape drives will continue to be a critical part of the storage infrastructure. This will continue because tape is far cheaper than disk storage in total cost of ownership, given the issues with power requirements for rotating storage and compression support with tape drives. Almost all of the tapes in the market claim 30 years of shelf life even lower-end tapes. (Of course, having a tape for 30 years might be possible, but how are you doing to be able to read it?) Tapes, as with any storage medium are dependent on outside influences like:
- What is the interface? Try finding an SCSI-1 interface from the early 1990s, much less IPI-3 interface 20 years from now.
- Will the tape drive be available to read the tape? A little over 30 years ago 7-track tapes were state of the art, but not finding one to read a tape today will be next to impossible.
- What is the data format of the tape? Some vendors write in tar format, for example, but will tar or an application like Veritas Netbackup be available in 2033? Tar probably will be, at least.
- What is the data and will any program be able to read it? PDF is a popular format today, and applications can read it today, but what about reading a MS Word 2.0 document from just 10 years ago from tape and try reading it into MS Word 2002?
All in all, it's clear that a migration strategy as part of the initial decision process is essential. Nothing last forever -- especially your data.
See All Articles by Columnist Henry Newman