As storage analyst Greg Schulz puts it, “Big data is a great, big catch-all for things.”
That said, there are some stand-out storage tools around designed to help storage administrators tackle a growing mountain of big data. Not surprisingly, many of them are concerned with Hadoop.
SGI InfiniteStorage enables storage to be virtualized into a fabric that spans high-performing flash to low-cost tape. This is done in a way that keeps data online at all times and that is said to be transparent to users.
“The SGI InfiniteStorage hardware and software ecosystem is how SGI has been addressing big data problems for two decades, and is in production in hundreds of the most demanding data management environments around the world ranging from weather forecasting, life sciences, manufacturing, media and education,” said Floyd Christofferson, director of storage product marketing at SGI.
Red Hat Storage Server 2.0
According to a recent report by the Linux Foundation, the majority of big data implementations run on Linux. It makes sense, therefore, that Red Hat is a major player in the big data storage space. Red Hat Storage Server 2.0 allows data to be stored and managed in one place and accessible by many enterprise workloads, said Ranga Rangachari, vice president and general manager, Red Hat storage business unit.
“Given the size and growth of data today, enterprises can't afford to build dedicated storage silos,” said Rangachari. “The ideal approach is having the data reside in a general enterprise repository and making the data accessible to many enterprise workloads.
Accordingly, Red Hat has teamed up with Intel to create better open source big data applications. As an initial action, Red Hat is taking advantage of the recently released Intel Distribution for Apache Hadoop software, integrating it with Red Hat Storage Server 2.0 and the Red Hat Enterprise Linux operating system. Further, a Red Hat Storage Apache Hadoop plug-in is about to be released to the open source community as a storage option for enterprise Hadoop deployments.
“Red Hat is uniquely positioned to excel in enterprise big data solutions, a market that IDC expects to grow from $6 billion in 2011 to $23.8 billion in 2016,” said Ashish Nadkarni, an analyst at IDC. “Red Hat is one of the very few infrastructure providers that can deliver a comprehensive big data solution because of the breadth of its infrastructure solutions and application platforms for on-premises or cloud delivery models.”
EMC Pivotal HD
Speaking of new Hadoop distributions, EMC’s version is called Pivotal HD, and it features integration with EMC’s Greenplum massively parallel processing (MPP) database. An engineering technology called HAWQ provides SQL processing for Hadoop and is touted as bringing more than 100X performance improvement to queries and workloads.
“Hadoop is a big deal and the key to unlock big data’s transformational potential, and we are marrying it with Greenplum technology to help catapult Hadoop into wide-scale adoption,” said Scott Yara, senior vice president of products, Greenplum, a division of EMC.
DataDirect Hadoop Apache Hive Driver
Part of the allure of Hadoop is that processing unstructured data into meaningful forms can yield intelligence that complements traditional analytics. The challenge is connecting existing business intelligence and data analytic tools to stored Hadoop data. The DataDirect driver for Apache Hive is said to be the only fully-compliant driver supporting multiple Hadoop distributions out-of-the-box, according to Michael Benedict, vice president of data connectivity at Progress DataDirect.
“Without the DataDirect Hive driver it would be difficult to access and analyze data, as Hadoop can store so much that it can become quite difficult to access it — especially if you need something quickly,” stated Benedict. “The DataDirect Hadoop Driver helps access information from the Hive Data Warehouse in real-time, making data analysis much easier.”