I have been saying this for a long time, and the trend is clear: whatever your application, be it cloud, Hadoop or file system, appliances are in your future. If you have a storage problem, at least one vendor has a solution to your problem that plugs in and works.
Data center consolidation, either within a corporation or to a public cloud, is very much a part of today's IT landscape. So what should you be doing to ensure that you have a job in the future, with your current employer or a future one?
My advice: Get on the appliance bandwagon and get ahead of the curve.
When companies outsource all or part of their IT infrastructure, it is because someone else can make a profit doing it. The margins I have seen and heard on IT outsourcing are up to 25 percent. Ask yourself why some other company or some cloud provider can buy all the hardware and software needed and still make a profit over the company's internal IT department. From what I see, part of that is due to internal politics that often prevent efficiencies in the data center. Each department wants to have it the way they want to have it.
But the appliance model is going to change the way that people think about IT, and it will change how organizations are structured.
Since this is a self-help article, I want to cover a number of difference appliances that you should be studying up on so that you are ready for the future. If your IT infrastructure is stovepiped, without integrated divisions for storage, virtualization and computation, the environment is going to need to change quickly over the next few years. Otherwise, you might be looking for a new job, as some vendor is going to come in and modernize your environment either by outsourcing to a vendor or IT contractor or by moving it to a cloud provider.
My view is that you need to get with the plan and you need to prepare, as the light at the end of the tunnel is a train coming at you. Let's talk about some of the various appliances that you will need to become familiar with.
These types of appliances are divided into three camps today.
- Standard Hadoop
- Shared file system Hadoop
- Fast storage appliance Hadoop
With a standard appliance, you buy nodes that are preloaded and configured and hardware optimized for Hadoop.
You can buy this type of hardware and software from many vendors. In some cases you are just buying the software for your own cluster, and in other cases you are buying the hardware and software from a single integrator that has optimized both. Either way this is standard Hadoop with three-way replication and hardware and software configured to run Hadoop—and not much else.
Shared file system Hadoop
A shared file system appliance generally has either the Lustre or GPFS file system that optimizes the shuffle phase in Hadoop. This works because the data can be globally read from the nodes and does not have be read and distributed across the network. All of the nodes are attached to the shared file system and can read the data directly from the storage without having to go from server to network to server to storage.
This has shown to be significantly faster for some problems than the standard configuration method for Hadoop. In addition, you have the reliability of RAID and failover (if designed in the architecture). Vendors have reliability studies showing that triple replication is not needed with storage if it is RAIDed.
Fast storage appliances
A number of vendors have or are developing SSD appliances for Hadoop. There are lots of them and more on the way. These are optimized for Hadoop and are easy to manage.
Which is the best?
Of course, the answer depends on the amount and type of data, how much is coming in and how many queries are going on. This is an area where you can help yourself by understanding the issues and asking the right questions.