The 'Dumbing Down' of Data Storage

Thursday Mar 14th 2013 by Henry Newman

As appliances take over storage, enterprises don't need as many highly skilled storage administrators.

These days, people are using the word "dumb" regularly to describe everything from from sequestration to movies. I figured I should jump into the mix with a bold statement: I think OEM storage vendors are being forced to "dumb down" storage because we do not have the storage talent to manage the complexity we have in our industry.

The question is, is dumb storage good or bad for most of us?

It is certainly good for the customers. It was not that long ago that we had SAN file systems using the VERITAS volume manager and file system (VxVM and VxFS) for many commercial sites and a wide variety of applications. Today, the world is completely different and much simpler. In my opinion, it all started with NFS and NAS storage.

So is the current change to more simplified storage part of a cycle? Or is this the way things will be for the long term?

The History of Storage Simplification

If you are a regular reader, you know my old saying: there are no new engineering problems in IT—just new engineers solving old problems. The current storage trend is a movement to appliances. I suspect this is happening because there is a lack of a storage administrators and architects. Things need to be simpler in order to sell.

The trend could also be due to other market factors, such as a lack of standards. We have the IETF for Internet standards, but for storage from the server side we have little to no leadership. We have the OpenGroup and SNIA, neither of which have been very successful in developing a wide set of standards for management (Note that SNIA came out with Storage Management Initiative Specification (SMI-S), but I think it was too little to late). There is an agreed-upon common framework for network management, but not for all the various file systems from local ones like XFS, EXT-4 and NTFS, all the way to the biggest parallel file systems like GPFS and Lustre.

Honestly, in my opinion, it is a shame that the vendors did not get together in the 1990s when they had the opportunity. But that lack of cooperation spurred innovation, which is why I think that NAS took off in the early 2000s. It was easy to use and easy to configure, manage and upgrade.

I remember back in the late 1990s and early 2000s. SAN administrators and architects were in extremely high demand and could command large salaries. Even after the blow-up, SAN administrators and architects were still getting higher-than-average salaries compared to others in the IT industry because there were just not enough good people.

Companies like EMC, HP, IBM, Sun, Veritas and many others tried to get more SAN talent by promoting certification and education programs. But this certification cost time and money for the customers, and there were often required classes with each new release or each year. Worst of all, if you got your Sun certification that did not help much with EMC. The only common overlap might be the Fibre channel switch. So if a customer wanted or needed a mixed environment, they had to have people spending a lot of time training.

In the 2000s, the SAN vendors started to get a clue—likely because of pressure from customers. Also, the consolidation of SAN companies began during this period, which reduced the number of training classes needed, along with the vendors trying to develop a common SAN management framework like SMI-S.

Too Little Too Late

During this same period, the NAS market was growing fast. Management was easy. Provisioning was easy. Upgrades were easy. Training was simple. The interface was NFS.

But there were two things lacking:

  1. The performance could not come close to SAN for streaming I/O. However, many found was that most I/O was not streaming, but IOPS, which the NAS vendors addressed by adding read caches.
  2. Scaling NAS beyond a single box was an issue because the performance did not scale. So this limited the file system size to a NAS frame. That covered a good percentage of the market, but did not cover the upper end of the market.

With a few exceptions, the large SAN file system vendors for the most part lost significant market share to the NAS vendors. Today the SAN file system market is quickly disappearing and being replaced. When you want a multi-petabyte namespace, you have a few choices in the market with POSIX file system, but you have a number of choices with REST/SOAP based interfaces. However, becoming a file system expert for today’s file systems still requires significant training, given the complexity of the hosts, networks, storage devices and mapping that to the hundreds of file system tunable parameters.

What Will Our Future Look Like?

We have gotten to a point where I think the amount of storage complexity exceeds the volume of storage talent. Combine this with the facts that we still do not have a common management framework, and that we have new applications, appliances and methods. I think we are seeing a rise of appliances which do not require high-end storage administrators, except at extreme scale.

For example, most of the parallel file system community has moved to storage appliances which have few knobs and switches. Most of the purchasing community for HPC environments has quickly embraced this technology, given the high cost and long training time needed to administer these file systems.

This, of course, is just one side of the coin. The same thing is happening with storage devices and software that manage object interfaces for REST/SOAP interfaces. They are becoming simpler to use and require high-cost administrators only at extreme scale.

So if you are a skilled, highly talented administrator, what should be your plan to ensure that your salary does not take a nose dive?

I think the answer is appliances for data analysis. (I am likely not talking about Hadoop, as many of the architectural designs for products in this area are completed.) Data analysis appliances are in their infancy today and will require significant care and feeding. The types of data analysis are going to be very complex. For example, you might de-pixelize an image and create a database of geolocations, normalizing for the resolution of the image, which might change over time based on improvements in technology. Then you might correlate the pixels to look for weather, climate or some other change like deforestation. This will be far different than taking business data and trying to correlate prices to sales to maximize profits.

Things that used to be difficult are going to be easier. But I suspect there will be a new class of more complex appliances that will try to address a wide variety of problems, and they are going to require significant tuning and configuration. The information collected and processed will have to be architected so that access is efficient when trying to correlate the information, process it and provide results to decisions makers.

Final Thoughts

The storage complexity problem for file systems has been mostly solved. There are still a few hard problems out there, but not as many as there used to be.

However, there is a new and even more complex set of problems right in front of us. These will require a deep understanding of what the users need to do with the storage and how they plan to access the data to create information on which actionable decisions can be made. These jobs are going to be high paying and require a broad set of skills. But the skills will be different than the current skills required for SAN and NAS and even the other types of appliances that are out there. Those involved are going to have to work directly with the application developers and users.

Come to think of it, this sounds a great deal like 1996 and 1997 when SAN file systems started to come out. Those of us involved then had to talk with everyone up and down the stack to get things going quickly and efficiently. I believe the same approach is needed today.

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved