Welcome to our howto on implementing Linux software RAID with no expense other than however many hard disks you wish to use, whether they be inexpensive ordinary PATA (IDE) drives, expensive SCSI drives, or newfangled serial ATA (SATA) drives.
RAID (define) is no longer the exclusive province of expensive systems with SCSI drives and controllers. In fact it hasn't been since the 2.0 Linux kernel, released in 1996, which was the first kernel release to support software RAID.
What RAID Is For
A RAID array provides various functions, depending on how it is configured: high speed, high reliability, or both. RAID 0, 1, and 5 are probably the most commonly used.
RAID 1, or "mirroring," clones two disks. Your storage space is limited to the size of the smaller drive, if your two drives are not the same size. If one drive fails, the other carries on, allowing you to continue working until it is convenient to replace the disk. RAID 1 is slower than striping, because all writes are done twice.
RAID 5 combines striping with parity checks, so you get speed and data redundancy. You need a minimum of three disks. If a single disk is lost your data are still intact. Losing two disks means losing everything. Reads are very fast, while writes are a bit slower because the parity checks must be calculated.
You may use disks of different sizes in all of these, though you'll get better performance with disks of the same capacity and geometry. Some admins like to use different brands of hard disks on the theory that different brands will have different flaws.
What RAID Is Not
It is not a substitute for a good backup regimen, backup power supplies, surge protectors, and other sensible protections. Linux software RAID is not a substitute for true hardware SCSI RAID in high-demand mission-critical systems. But it is a dandy tool for workstations and low- to medium-duty servers. PATA (or IDE) drives (define) are not hot-swappable, but you can set up an array with standby drives that automatically take over in the event of a disk failure. If you don't want to use standby drives your downtime is limited only to the time it takes to replace the drive, because the system is usable even while the array is rebuilding itself.
Hardware RAID controllers come in a rather bewildering variety. Mainboards come with built-in IDE RAID controllers, and PCI IDE RAID controller cards can be had for as little as $25. Most of these are like horrid Winmodems, in that they require Windows drivers to work and have Windows-only management tools. I wouldn't bother with IDE RAID controllers -- Linux software RAID outperforms them in every way, and costs nothing.
A true hardware RAID controller operates independently of the host operating system. You'll find a lot of choices for SATA (define) and SCSI drives. SATA controllers cost from $150 to the sky's the limit, depending on how many drives they support, how much onboard memory they have, and other refinements that take the processing load away from the system CPU.
Good SCSI controllers start around $400 and have an even higher sky. Both SATA and SCSI controllers should support hot-swapping, error handling, caching, and fast data-transfer speeds. A good-quality hardware controller is fast and reliable; but finding such a one is not so easy. Many an experienced admin has lost sleep and hair over flaky RAID hardware.
Something to keep in mind for the future - as SATA support in Linux matures, and the technology itself improves, it should be a capable SCSI replacement for all but the most demanding uses. (For more information see the excellent pages posted by the maintainer of the kernel SATA drivers, Jeff Garzik.)