Gartner lists Big Data (BD) as one of its “Top 10 Strategic Technologies.”
"Big data is a topic of growing interest for many business and IT leaders, and there is little doubt that it creates business value by enabling organizations to uncover previously unseen patterns and develop sharper insights about their businesses and environments," says David Newman, research vice president at Gartner.
According to Deloitte, “more than 90 percent of the Fortune 500 will likely have at least some BD initiatives under way” by the end of this year, at a cost of $1.3 billion to $1.5 billion. But those figures just bring up the billion dollar question, what exactly is “Big Data”?
“If you think the term ‘Big Data’ is wishy-washy waste, then you are not alone,” says Forrester analyst Mike Gualtieri.“Many struggle to find a definition of Big Data that is anything more than awe-inspiring hugeness.”
He says that Big Data can be considered in terms of the volume, velocity and variety of data, but that “there is no specific volume, velocity, or variety of data that constitutes big.”
“One organization’s Big Data is another organization’s peanut,” says Gualtieri. “It all comes down to how well you can handle these three Big Data activities”: the storage, processing (cleansing, enriching, calculating, analyzing) and querying of the data.
Effectively managing Big Data to obtain the desired business results, therefore, requires a rethinking of every aspect of the way that data is stored and used. “Big data disrupts traditional information architectures — from a focus on data warehousing (data storage and compression) toward data pooling (flows, links, and information shareability),” says Newman.
Here are some of the approaches storage vendors are taking to create Network Attached Storage (NAS) products that meet the needs of Big Data.
While most NAS vendors use disks for storage, Crossroads Systems, Inc. of Austin, TX, also uses tape as an online storage mechanism in its StrongBox systems.
“Crossroads StrongBox is an online all-the-time, fully portable data vault for long-term data retention. It leverages Linear Tape File System (LTFS) technology, providing the first-ever enterprise-level, file-based tape archive solution with built-in data protection and self-monitoring for optimized performance at a significantly reduced cost,” says Senior Product Manager, Debasmita Roychowdhury. “It also incorporates disk for fast file storage and retrieval, and physical tape for cost effective, long-term, reliable capacity storage.”
Crossroads has two rackmount versions. The 1U T1 box supports a file transfer rate of 160 MB/s and the 3U T3 has a file transfer rate of 600 MB/s. Both models use a mix of SATA disks and LTO5 tape drives. But unlike tape archive systems that require IT involvement to restore files to disk when users need access, the StrongBox system includes the tape files as part of the same file systems as those on disk, and end users can access them directly. The files on tape just take a bit longer to access than those stored on disk.
“Tape is transformed into an online all-the-time, easily accessible file system,” says Roychowdhury. “Multiple access points can engage StrongBox simultaneously and will be presented with a unified, persistent view of the data vault. This minimizes IT dependency and allows users to experience real-time, online access.”
This system uses policies and self-monitoring to manage the storage tiering and disk caching to speed file transfer to and from tape. By utilizing a mix of tape and disk, it can achieve up to an 80 percent cost reduction over a similar capacity disk-only NAS. As storage needs grow, additional StrongBoxes can be installed, auto-discovered and added to the file system. “Data grows exponentially – budgets don’t,” says Roychowdhury. “Organizations can now invest in a cost-effective solution that seamlessly scales as their archive grows from 500,000 files to 5 billion.”
In November 2010, EMC spent $2.25 billion to acquire scale-out NAS vendor Isilon Systems of Santa Clara, CA. Isilon’s storage clusters use OneFS, a fully-symmetric file system that has no single point of failure and allows from 18 TB to more than 15 PB of data and up to a trillion files to be managed in a single namespace.
“OneFS allows a storage system to grow symmetrically or independently as more space or processing power is required—providing a grow-as-you-go approach and the ability to scale-out as your business needs dictate,” says Brian Cox, Sr. Director Product Marketing. “Nodes can be added to the file system and be ready to use in minutes—versus a traditional file system which can take hours to install, configure and provision.”
Cox says that while Big Data was initially limited to specific industries such as life sciences, media and entertainment, or Web 2.0, it is now finding broader application in traditional business computing.
“Today, the clear delineations that have existed between Big Data vertical industry requirements and enterprise IT requirements have now blurred to the point that they are no longer distinguishable,” he says. “The simple fact is that these two worlds are rapidly converging, creating a need for a fundamentally different way to meet the storage needs that enterprises will have going forward.”
But that convergence doesn’t mean that the same exact systems would be used. Cox explains that it is critical to match the storage system to the business and storage needs. For example, if an organization’s file and unstructured data needs are growing slowly and will stay under 150 TB for the foreseeable future, he recommends going with a scale-up design such as EMC’s VNX unified storage. But if the need for file and unstructured data needs are over 50 TB and growing fast, then they should choose a scale-out architecture such as EMC’s Isilon storage.
“Customers need to select the right tool for the right job and thus need to understand to profile of their workload over time,” says Cox.
HP’s IBRIX X9000 Storage product family uses a “pay-as-you-grow” modular architecture that allows customers to gradually purchase and centrally manage storage — up to 16 PB in a single namespace — as their needs grow.
“This highly scalable and economical file storage infrastructure serves as an effective archive for the HP ecosystem of Big Data solutions that includes Apache Hadoop for batch analytics on unstructured data, HP Vertica for structured real time analytics and HP Autonomy for meaning-based computing,” says Stephen Bacon, Senior Manager of NAS Product Management in Fort Collins, Colorado. “It is complemented by HP's Information Management and Analytics consulting practice plus technology implementation and support services.”
When selecting Big Data file storage, Bacon says that HP recommends customers look at (1) the requirements of their workloads, (2) the roadmap and pace of innovation for each vendor's offering, (3) the economics of each vendor's offering including whether they are modular with all-inclusive features and (4) the ecosystem of solutions and services that each vendor enables.
"'Pay-as-you-grow' modular architecture enables customers to avoid storage over-provisioning and manage costs," he says. "All inclusive features for data protection, data retention, and data mobility including tiering ensure no hidden expensive add-ons."