Rethinking Storage for Microservers Page 3

Monday Feb 25th 2013 by Jeff Layton

With microservers, it may be necessary to create two separate layers of networked storage.

Microservers are a huge coming wave that could overtake you if you're not ready.

Microservers are simply small servers. Compared to a dual-socket or quad-socket server, a microserver has reduced capabilities and options. But they can still run virtually the same set or, at the very least, a useful subset of the applications larger servers execute, albeit at a lower level of performance.

Right now, microservers have everything from single-core processors to quad-core processors, with some basic amount of memory, one or two NICs and a single hard drive (most of the time) of some reasonable size and capacity.

The reason microservers are becoming so popular is that they use very little power, perhaps 5-20W for the entire server, versus hundreds of watts in the larger traditional servers. Moreover, the word "micro" also refers to their size. They are very small, and you can put thousands of them in a standard 42U rack.

Given that many traditional enterprise servers run at about 15 percent utilization, organizations have turned to virtualization to improve the server utilization. For example, instead of perhaps running one application per core resulting in maybe sixteen applications being run at the same time, virtualization allows the creation of a virtual machine which looks like a real physical system to the application. Then you can "oversubscribe" the hardware, perhaps running a hundred or more VMs on a single server.

This can improve the utilization to a very large level saving power and cooling, floor space, money, etc. However, to do this, you must stuff a fairly large amount of memory in the physical server so that each VM has enough memory to run an application. A large amount of local storage may also be needed depending upon the applications. Moreover, since all your proverbial eggs are in one basket, you must make the server as bulletproof as possible.

All of this adds cost to the server.

Microservers offer an alternative to the large single server by using lots of small servers to provide the same level of computational power. If you lose one server, you have lost just one of many. In virtualization, if you lose one server you lose everything.

Microservers can also be inexpensive, so you if you lose one you have only lost a small part of your hardware investment. Moreover, since the microservers are independent of one another, you don't have to worry about data or applications that have to be segregated on different servers since everything has its own server.

Because they are independent, the microservers run their own OS, which also means that you don't need virtualization. That simplifies the situation and could possibly make the application(s) run faster due to dedicated hardware.

On the flip side, since each microserver has its own OS, you must manage a larger number of servers. This isn't an insurmountable problem—HPC systems routinely run thousands of servers from one master server—but it is something you need to take into consideration. Since you are not virtualizing the microservers, you cannot move the OS, applications and data to another server as you can with a virtualized server.

Microservers are an alternative to heavily virtualized servers, and like many solutions, they offer some pros and cons. One of the aspects of microservers that needs to be understood and architected is storage.

Layers of Storage

Recall that the description of microservers typically includes one 2.5" drive (usually SATA). This may or may not be enough for various use cases. Consequently, we need to think about how the microservers' storage is architected.

This can be even more important for diskless microserver farms. An example of a possible diskless microserver solution is the Dell Zinc. With a single 2.5" drive for each server, the Dell Zinc can only have 24 servers in a sled or 96 servers in 4U (good density but not the best). On the other hand, if you go diskless, you can get 72 servers per sled or about 288 servers per 4U (much better density). But you need to consider how you provide storage for these diskless nodes (e.g. iSCSI, NAS, etc.).

With the flexibility that microservers offer, considering multiple storage layers is important. Let's start by examining local storage to see what a single server has available.

Local Storage

Let's assume our microserver has a single hard drive attached. These are almost always 2.5" SATA drives. Since price is something of an objective for microservers, I will focus on consumer-type drives. This also gives an idea of the lowest price per GB and the minimum overall price. These numbers can also serve as guideposts for where enterprise storage may be in the not too distant future. For pricing I'll just use a favorite of mine, Newegg.

At one end of the storage spectrum is capacity. The largest capacity 2.5" laptop drive I could fine is a 2 TB, 2.5" SATA drive for about $180.00 (as of the writing of this article). It's probably not the fastest drive (uses Western Digital's "IntelliPower" which appears to mean that it has a fairly low rotational speed — perhaps 5,000 - 5,900 rpm), and it only has an 8MB cache on a 3 Gbps SATA interface. But in this case, we're only after capacity, so I'm not too worried about the drive interface and the cache size. This drive gives a large capacity at $0.09/GB (pretty inexpensive). But remember that it is a consumer drive, so don't think that you can get this price point in an enterprise drive.

At the opposite end of the storage spectrum is performance. The best performance we'll get for a single drive is obviously an SSD. However, we still need a reasonable capacity, so let's look for large capacity SSD drives. There is a 1 TB SSD that is a bit expensive at around $2,500. But there is a more reasonable 512GB SSD for about $380.00. Let's focus on the lower capacity drive since it has a better price per GB ($0.74/GB) than the larger drive. It has good performance since it has a SATAII interface (6 Gbps) with about 500 MB/s for sequential reads, 260 MB/s for sequential writes, up to 45,000 random read IOPS, and up to 50,000 random write IOPS.

The large capacity disk and the large capacity SSD bracket the performance and capacity options for a microserver with a single drive. The spinning drive has tons of capacity, but it is likely to have fairly low performance — 100MB/s or less with very low IOPS relative to SSDs. But it has a very attractive price per GB and only costs about $180.00.

The SSD has much greater performance, particularly for random IOPS, but costs a bit more ($380.00) and the price per GB is about 8 times more than the spinning drive.

One key thing to keep in mind is that microservers are fairly inexpensive, with list prices currently about $750 per server without a drive. The spinning drive is about 24 percent of the price of the server, but the SSD is about 50 percent the price. Do you buy a lower-cost, high-capacity drive with poor performance or an SSD with less capacity, much higher performance, and a higher cost?

One option to improve this situation is to perhaps rethink local storage in microservers. At this time, each server basically gets one 2.5" disk or SSD for local storage. However, there is an interface called mSATA or mini-SATA that is becoming popular for small SSDS in netbooks and laptops. An example of an mSATA SSD drive is the Intel 525.

Some of these drives, such as the Intel 525 240GB mSATA drive, can have really amazing performance. The specs on the drive indicate a maximum sequential read performance of 550 MB/s, a maximum sequential write performance of 520 MB/s, a 4KB random read IOPS of up to 80,000, and a 4KB random write IOPS of up to 50,000. mSATA SSD drives can also be up to 480GB in capacity. At this time, the pricing on these consumer drives is a little more than $1.00/GB at this scale.

Imagine being able to put two to four of these mSATA drives on each microserver using either a simple RAID controller on the SoC or software RAID. Then you might be able to get well over 1 GB/s in performance with lots of IOPS. However, the capacity won't be that large and the price per server will be higher than you might want.

While an interesting idea, mSATA SSD drives don't get us out of the pickle of needing larger capacity storage and better price/performance. My opinion is that we'll need network storage for this situation, and we'll definitely need it for diskless servers.

Network Storage

The obvious solution for network storage is a centralized NAS for either NFS access or CIFS access or even both. This would be a centralized storage server with a fair amount of capacity and/or a fair amount of performance. There can be a large number of drives, perhaps an accelerator module or two, and some other "magic" to improve performance since microservers can be packed so densely.

On the client side, let's assume that each microserver has a Gigabit Ethernet (GigE) link to a network to which the NAS server connects. Let's also assume the server doesn't use the local disk for storage or that the node is basically diskless.

In the case of a highly dense solution, we could have up to about 2,880 servers in a single rack. That results in the possibility of 2,880 GigE clients using the NAS server at the same time (worst case). That is the same as 288 10GigE links, 72 40GigE links, or about 52 FDR InfiniBand links. That's quite a bit of data traffic, if you ask me. This would be a very large centralized NAS storage solution for every single rack.

Even if we play "best case" and assume maybe one out of ten servers is using the NAS storage, we still need 28.8 10GigE links, or 7.2 40GigE links, or 5.2 FDR IB links. This is still quite a bit of NAS performance.

Because of the extreme densities that microservers offer, you could need an uber-powerful NAS storage solution for every rack. Obviously this is an untenable situation. Or is it?

NAS In The Rack Option (NITRO)

What is needed for microservers is something in-between local storage and a big, bad, centralized NAS (BBC-NAS). I call it NITRO (NAS In The Rack Option). The idea is to push a reasonably sized NAS into the rack with a subset of the microservers rather than have a BCC-NAS for everything. This storage is intended to be a fast-scratch or working space storage solution that is not necessarily backed up on a regular schedule.

This solution allows a certain number of microservers to have their "own" NAS box to use for network storage. Then, periodically, the data is copied or "rsync-ed" to the BBC-NAS (Big, Bad, Central NAS).

I would like to lay claim to this idea, but all I can lay claim to is the acronym. In my day job, I have seen a few customers implement this idea with great success. These customers have shown that this is definitely doable — if you carefully plan the NITRO solutions, how you manage them, and how data moves to and from them.

NITRO effectively creates two layers of network storage. The first "NITRO" layer, the one highest up the storage pyramid, provides NAS storage that is close to the servers. The second layer of NAS, the BBC-NAS, is further down the storage pyramid and doesn't have to have the performance that would necessary if it were catering to the entire rack. Sounds pretty reasonable, so let's go a step further and examine what a NITRO would look like.

Recall that there are microservers that can fit up to 288 servers into a single 4U chassis (probably diskless). Again assuming that each server has a single GigE, then you have 288 GigE lines coming from the servers. In the worst case of all servers in a single 4U chassis communicating to a single NITRO at the same time, this is the same as 28.8 10GigE lines, or 7.2 40GigE lines, or 5.14 FDR IB lines. If we go back to assuming that only 1/10 of those servers are communicating at one time, the NITRO only needs 28.8 GigE lines, or 2.88 10GigE lines, or 0.72 40GigE lines, or 0.51 FDR IB lines. This configuration sounds very reasonable for a NAS solution.

In the case where each server has some local storage, then there are perhaps 96 servers in 4U. If each server has a single GigE line, then in the worst case with all servers communicating at the same time, you have 96 Gbps going to the NAS gateway or about 10x 10GigE lines (2.5x 40GigE). If we use the 10:1 ratio, this changes to one 10GigE and 40GigE becomes overkill. Again, a very reasonable configuration for a NAS solution.

High density storage units that can fit 60 or more drives into a 4U space are prevalent. Using 4 TB drives you can get 240TB of raw space in these units. Or you can use a mix of SSDs and hard drives to perhaps get better performance.

There are other tricks you can use to improve performance depending upon your requirements and tolerance of data loss. These storage units can be coupled with a 1U server with multiple 10GigE links or a single 40GigE link creating a 5U NAS storage solution that can provide a great deal of storage capacity and performance.

Using a NITRO for every 4U set of microservers does dilute the density situation going from 288 servers in 4U (72 per U), to 288 servers and storage in 9U (32 per U). But in a 42U rack you can get 4 of the microserver/NITRO combinations resulting in 1,152 servers and 960TB's of network storage in a single rack. This is still a pretty amazingly dense solution with a great deal of IO to the servers.

When the servers have local storage we get the same number of microserver/NITRO combinations in a 42U rack (4). That results in 384 servers and 960 TB's of network storage. That is still very good and gives you a great of flexibility in terms of IO.

At this point there are a number of people reading this article who are positively seething because they now have to worry about five NAS units per rack: four NITRO units and one BBC-NAS, instead of one NAS server for a number of racks. I understand your agitation and agree that this can be a pain for administration. Been there, done that.

However, let me tell you about a customer who has over 20,000 total cores and 46 individual NAS units all managed by no more than six admins. And those admins also monitor and admin the local network and an array of Web servers and handle over a thousand users. While they may seem superhuman, and I like to think that they are, they actually get to sleep at night. The reason is planning and automation.

Using tools such as Puppet, or HPC tools such as Bright Cluster Manager or Warewulf, one could develop a standard image for each NITRO. In this image, you would need to be sure to include monitoring tools such as Ganglia, Nagios, and others. You also need to think about how the data from the NAS rack units will be rsynced to the higher-level NAS (BBC-NAS) and how a NITRO can be restored from the BBC-NAS. It's not a difficult thing to do, but you do need to test it.

Beyond images, you would also need to plan for user management. Which users will be assigned to which set of nodes and, consequently, which first-level NAS (NITRO)? What happens if that user leaves — how do you migrate that data to a different user? You also need to consider, plan, and test how the data is moved from NITRO to BBC-NAS from a user's perspective, particularly permissions, dates and extended attributes (xattr). All of this is doable and has been done. But it takes careful planning.


As with all things in life, there is a cost/benefit. The current model of storage for microservers, each node having a single disk and using network storage, has some fairly significant issues that impact cost and scalability.

I put forward the idea of something I labeled NITRO (NAS In The Rack Option). The idea is to actually have more NAS boxes, but to integrate them with a number of microservers to provide semi-permanent storage. Then these units are synced to a larger centralized network storage solution, resulting into two levels of NAS storage.

NITRO introduces some additional complexities because you have more NAS units to administer. There are data centers that routinely administer exactly this architecture — about 40+ NAS servers and one large centralized NAS server. It has proven to be very effective, and there is no reason why this approach can't work with microservers.

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved