Web 2.0 has become quite a buzzword in the storage industry in the last year, with titans and upstarts alike pledging to develop storage systems suited for fast-growing, collaborative environments. But before we see what these storage systems will look like, let's first take a look at what type of data is suitable for them.
We'll call it Web 2.0 data for lack of a better term. It's different from traditional, transaction-based data in both nature and use. It comes in large files, typically created by a single user, and may be shared over some geographic distance. Much of Web 2.0 data is what you'd expect from the name: images, video, and e-mail archives, for example, but the category has also come to include volumes of information from surveillance camera footage, geospatial mining data, genomic sequences and financial analysis scenarios.
File-based Web 2.0 data is just as important as a company's transactional data and requires similar degrees of availability, security and protection from loss. Like traditional corporate data, Web 2.0 data is expanding only more so.
To cope with the growth of Web 2.0 data, companies are adopting a storage technology developed by Web pioneers like Google (NASDAQ: GOOG) and Yahoo (NASDAQ: YHOO). Borrowing from high-performance grid computing, this approach to storage uses large racked clusters of compute and storage nodes made up of fairly inexpensive industry-standard servers and drives. The data is distributed and duplicated over multiple nodes, often geographically separated. The storage component is CAS or NAS, using SATA or SAS drives.
To lower cost, power consumption and cooling costs, nodes are optimized with only the features required for the application. Less expensive than blades, cluster nodes are denser and without redundant power supplies and fans. Redundancy is at the node level, and the clustering software handles node failures transparently, providing both resiliency and the flexibility. Such clusters are more-or-less self-managing and scale up quickly.
Depending on hardware configuration and the software you install, clusters can be compute-intensive for HPC tasks or more storage-oriented, providing the equivalent of a huge NFS cloud with a single name space.
Companies like Google and Yahoo built and still build their own custom infrastructure. Google orders huge quantities of custom motherboards directly from Intel to fit its low cost and power consumption requirements. (If Google were a system manufacturer, it would be in the top five.) However, you don't have to build your own custom Web 2.0 storage infrastructure. Increasingly, mainstream storage companies are developing products and services to do this for you.
Design to Order
Dell (NASDAQ: DELL) was one of the first companies to provide Web 2.0 infrastructure. Its Data Center Solutions Division announced Cloud Computing Solutions in March 2007. Through this program, Dell designs, provides, and even installs racks of servers and storage for clustered service or storage delivery, optimized for your application (and low power consumption). There are even maintenance and rental options.
According to discussions on Dell's In The Clouds blog, this service is for large orders (1500+ nodes) and you must provide your own clustering software. Dell is not providing the off-the-shelf systems that it sells to the public, but has developed systems designed specifically for clustering applications.
Sun Microsystems (NASDAQ: JAVA) and Rackable Systems (NASDAQ: RACK) are also in the Web 2.0 business. In addition to offering racks of compute and storage nodes suitable for clustering, both companies are notable for offering mobile data centers packaged in storage containers. Sun's Modular Datacenter S20, for example, sits in a 20 foot-long shipping container with only single power, network, and water hookups.
Water cooling allows these units to be denser and more power-efficient than a similar number of nodes in a typical air-cooled data center. The main attraction is getting massive amounts of storage or computing power going in a short time. Again, you must provide the clustering software to tie it all together, although Sun last year acquired the Lustre clustered file system and is bringing it into its Open Storage project.
Space and power consumption have become big data storage issues, particularly for Web-scale data centers. IBM's (NYSE: IBM) April introduction of the iDataPlex Web 2.0 server system directly addresses these concerns. By rotating a standard 42U rack 90 degrees about its vertical axis landscape instead of portrait and fitting in two side-by-side stacks of half-depth nodes (15 inches front-to-back), IBM can shoehorn up to 84 CPU nodes into the space usually occupied by 42, with 16U of lateral space left over for switching hardware. For storage applications, there are 3U units that supply one CPU and 12TB of hard drive storage, for a maximum 336TB per rack with 28 nodes.
The sideways twist is even more important for reducing power consumption. The distance the fan units must push air to cool the nodes is half what it normally is, and since the relation between cooling distance and fan power is non-linear, the drop in power required is much more than half. More efficiency comes from using fewer, larger fans. Pluggable four-fan units cool eight nodes. According to Gregg McKnight, distinguished engineer and vice president at IBM Modular Systems Development, the fans consume approximately 6 watts per server. For data centers with maxed-out air conditioning systems, iDataPlex can take an optional water-cooled heat exchanger that provides a net cooling effect.
According to McKnight, "Companies buying lots of nodes want them just the way they want them."
Though not as customizable as Dell's cluster systems, IBM provides 22 different node variations (processor, I/O slots, memory and storage) with several supply options to better match power to application need. IBM can supply either Linux or Windows to run the Intel-based nodes, and also provides the clustering capability with the Nextra software it acquired when it bought XIV.
As a result, IBM can provide "a compute cluster optimized for space," said McKnight. "The entire solutions is pre-built, cabled and tested, allowing the customer to bring it up in minutes."