Enterprise SSDs give us lots of freedom in storage design, but is this a blessing or a curse?
Solid State Drives (SSDs) have been around a lot longer than you think. Their current popularity started a few years ago with small-capacity, expensive drives. They had much better performance than spinning drives, and this proved important in a number of areas. They also boasted low power draw and insensitivity to shock (i.e., you could shake them while running and they would not crash or lose data).
However, they were far from perfect. The initial capacities were small, the price per GB was much, much larger than for spinning drives, and they had some peculiarities relative to spinning disks that people had to come to understand. Some of these included the following:
- Endurance limitations (limited number of writes to cells)
- Updating a bit involves an entire block
- Write amplification — more than one write to write a bit of data
- Reading the data could lead to data corruption
In addition, all of the drives were SATA based. Remember that the SATA protocol has a much higher data error rate than SAS (whether on consumer SATA or SATA/SAS nearline enterprise devices). If your file system cannot detect data corruption, then you faced some difficulties with these SSDs.
Earlier SSDs also had some general difficulties. For example, it was difficult for them to sustain a certain level of performance (write or read). Because of the write endurance limitation, manufacturers offered fairly short warranties.
Over time, the manufacturers developed new techniques and technologies to address these limitations, but there was still some apprehension around the drives, particularly in the enterprise world.
Addressing these observations led manufacturers to produce a better SSD that they call the "enterprise SSD." At a high level, the typical benefits of an enterprise SSD include the following:
- Higher performance
- More consistent performance
- Protection of DRAM-stored data in the event of a power loss
- Stronger error correction code (ECC)
- Consistent and persistent quality of service
- Lengthier warranty
- Greater level of endurance
- Greater level of over provisioning
Of course, enterprise SSD drives cost more than a consumer grade SSD, but it's a trade for features you get in the enterprise SSD. Moreover, enterprise SSDs can potentially come in a variety of interfaces including SAS. Note that the definition of an enterprise SSD is not based on the drive interface.
One of the key features of the enterprise SSDs is endurance.
Enterprise SSD Endurance
There are two industry standard bodies for SSDs that have defined what endurance means for enterprise SSD. These are, (1) the Joint Electron Device Engineering Council (JEDEC), and (2) the Storage Networking Industry Association (SNIA). Each of these two organizations have published specifications for endurance (JEDEC) and performance (SNIA) to distinguish between consumer SSDs and enterprise SSDs.
An important consideration in these standards is the data usage models for consumer and enterprise SSDs. Consumer SSDs are tested with consumer applications. This also means they are not tested in a 24/7 scenario since that is very, very uncommon in the consumer world. In contrast, enterprise SSDs are tested with enterprise applications and are tested in a 24/7 environment that one would encounter in a data center.
In JEDEC 218 and 219, consumer and enterprise data usage models are defined. For enterprise SSDs, JESD 218A defines the data usage model as 24 hours per day at 55° C with three months of retention at 40° C. In contrast, in JESD 218A, consumer data usage is defined as eight hours per day at 40° C with one year of retention at 30° C. These data usage models are very different from one another.
Manufacturers have improved the endurance of enterprise SSDs over time using various techniques including wear-leveling algorithms, over-provisioning and self-healing. Over-provisioning is a very common technique used in SSDs that can ultimately help wear leveling and improve endurance. Enterprise SSDs typically reserve a greater percentage of the NAND flash than consumer drives. In turn this allows enterprise SSDs to use lower-endurance NAND Flash options including multilevel cell MLC, 3D NAND and triple-level-cell (TLC). These are lower-cost options that help keep the price down despite the increase in over-provisioning.
Typically you will see SSD endurance described with one of two terms. The first is full drive writes per day (DWPD) for a certain warranty period. If you have a 100 GB SSD with a DWPD specified as one (one full drive write per day) then it can handle 100GB of data being written to it every day for the warranty period. If the DWPD was 10, then it could handle 1TB of data being written to the drive every day for the warranty period.
The second term that describes SSD endurance is terabytes written (TBW). This describes how much data can be written to the drive over the life of the drive. A larger number indicates that the endurance is better than a drive with a smaller TBW number.
Importantly, how the data was written to the drive is not specified for either measure. For example, the testing could be done using streaming data, which is a bit easier to handle than random 4K IOPS. With random write IOPS you are likely to also get reads, some wear leveling and garbage collection IO functions used as part of the testing. With streaming writes, you are less likely to get these additional IO functions. These differences impact the endurance.
Enterprise SSD Variation
Enterprise SSDs come in various types as well. You should pay very close attention to the specifications of the drives. SSD manufacturers will sometimes offer different endurance options with enterprise SSDs. Pay attention to the DWPD and/or TBW numbers as well as the length of the warranty.
Moreover, manufacturers also offer enterprise SSDs that are more read-oriented or more write-oriented. For example, a more read-oriented drive might appear in the specs as a drive with a low DWPD relative to other drives, perhaps a DWPD of 1, and perhaps with a large capacity. Write-intensive drives will have a much higher DWPD and perhaps a lower capacity.
To better understand the drive variations, let's look at some example drives. The first drive family is the Intel DC S3x10 drives. The DC3710 series of drives is designed for up to 10 DWPD and up to 24.3 Petabytes written (TBW). In the same family is the DC3610 series of drives with is listed with a DWPD of 3 and up to 10.7 Petabytes of data written (TBW). The third drive in the family is the DC 3510 series of drives that is listed with a DWPD of 0.3 and up to 880 TBW. Within a single family of drives, this illustrates the wide range of specs for the various drives.
Another example is the Toshiba PX02SS enterprise SSDs. These drives have a DWPD rating of 30. They also have a read-intensive series of drives that is listed with a DWPD of 0.5 or 1.
In addition to various drive interfaces we now have enterprise SSDs with various endurance, performance and capacity characteristics. In the world of optimization, this is said to "increase the dimensionality of the design space" or "increases the number of degrees if freedom." In other words, you have many more options when designing or architecting a storage solution.
Storage Design with Enterprise SSDs
The storage solution design space, if you are starting from scratch, has a tremendous number of options. There are network options, drive options (HD or SSD), options for drive performance and capacity, options for drive endurance (SSDs), drive interface options, array options, file system options, and on and on. There are decision points within this matrix of options but it is beyond the scope of this article to work through them all. For this article, I want to restrict the options just to enterprise SSDs regardless of interface. This leaves us with capacity and endurance (DWPD and/or TBW) as primary features.
Recall that there can be large variations in enterprise SSDs. These variations are a function of DWPD, capacity and performance. This gives us great flexibility but also means we will need to make decisions during the design phase.
The first decision point is around whether the applications are write-intensive, read-intensive, a combination of both, WORM (write once, read-many, WORN (write-once, read-never) or some combination of these. With this information, you can start to focus in on drives that have a large DWPD (large TBW) or a smaller value or somewhere in between.
The second decision point that comes up is total needed storage capacity. Depending upon some other factors this could push you toward certain drive capacities. In turn this could also drive certain RAID levels. But be careful — introducing RAID could force you to rethink DWPD (TBW) to accommodate the RAID parity that accompanies RAID.
The third decision point (and this one is actually part of the other two) is the volatility of the data. Are the enterprise SSDs being used for volatile data like swap data or for less volatile data such as logs or application output? If they are used for volatile data, then perhaps enterprise SSDs are overkill and consumer SSDs could be used, saving money. There are cases that even for volatile data you might want to use enterprise SSDs because of the importance of having much better reliability in the device.
A related decision is if the data is being redundantly stored somewhere. If you have a copy of the data somewhere else, then perhaps the storage solution would look different. For example, if the data is stored somewhere else, then users could access the data from that storage, allowing the focus of the enterprise storage to be on write-intensive drives with a medium or large value of DWPD (TBW). Or, you could even switch to consumer SSDs since you have a copy of the data elsewhere. This could also allow you to select smaller capacity drives because the applications are just writing to the storage and reads happen from other storage.
Enterprise SSDs and Storage Design: A Blessing and a Curse
It is clear that enterprise SSDs are much better than consumer drives for enterprise workloads. They have better endurance, more consistent performance, more consistent quality of service, longer warranties and better data protection than consumer SSDs. Drive manufacturers have created a wide range of enterprise drives with varying characteristics. This gives us tremendous flexibility, which is both a blessing and a curse. It's a blessing because we know have more freedom and can tailor the storage system to the needs of the applications. It's a curse because it's going to take more work to design the storage solution.
Regardless of whether it's a blessing or a curse, there is one thing that is obvious — you will need to know your applications very well to properly design the storage solution. This knowledge includes knowing the importance of IO to the applications, the IO patterns and how much data is written and read during normal operations. Without knowing any of this information, storage design with enterprise SSDs, or any storage devices, reduces to random selection. If you know your IO patterns to some degree, then enterprise SSDs are definitely a blessing, allowing you to tailor the storage design to the applications. But if you don't really know anything about the IO patterns of the applications, then enterprise SSDs can be a curse.
This article represents my own viewpoints and not those of my employer, Amazon Web Services.
Photo courtesy of Shutterstock.