Benchmarking Storage Systems, Part 2
This is the second in a three-part series on benchmarking. Part 1 examined each of the components that might be included in a typical benchmark. Today we’ll look at developing representations of your workload as well as the pros and cons of using your applications and real data in the benchmark as opposed to developing emulations of both.
The most important part in the development of a set of storage benchmarks is ensuring that the benchmarks represent your current workload and how you run that workload on the system(s). With an understanding of your current workload, you can begin to predict how the current workload relates to the future workload.
There are many ways to create workloads that represent your real work. Some are more difficult for you to create, while some are much harder for the storage vendors to run, and of course some are halfway in between (I know, sounds a bit like Goldilocks and the Three Bears). Whatever you chose, you need to completely understand the tradeoffs and ensure that your organization understands the advantages and disadvantages of the decisions to be made.
Going into the development of a benchmark, you have some hard choices to make. The first is whether the benchmark will utilize your real applications and real data, or will an emulation of the workload(s) need to be developed.
Each of these two paths has pros and cons for both you and the vendor. You might wonder why I seem to be so concerned for the vendors. Besides once being a vendor and a benchmarker, I have come to realize that there is no free lunch when doing benchmarks. In other words, if you develop an expensive benchmark, quite often the cost of the benchmark will more than likely be rolled into the bid price of the hardware depending on the size of your procurement and how much the vendor(s) want your business.
Here’s how I see the tradeoffs:
|Use actual applications code
||This is the best measure of your real work if you structure the benchmark correctly. For database benchmarks, using the actual data may have security issues for your company, but using actual data will result in the most realistic benchmark
||These types of benchmark generally have more setup time and are more difficult to run for the vendors, as a great deal of application tuning, file system tuning, system tuning, and RAID tuning may be required, all of which can affect final pricing
|Develop a set of representative storage benchmarks
||Far easier to run and to scale to larger workloads. Running this type of workload is also far easier for the storage vendors
||Sometimes difficult to develop characterization of the workload given system tunables, file systems, volume managers, and the actual storage hardware
Page 2: Using Your Workload
Using Your Workload
If you are going to use your actual workload in a benchmark, the first step is end-to-end hardware and software characterization. You need to document and understand:
- What applications are being run
- The number, location, and sizes of the data sets being used
- The server(s) hardware configuration, including CPUs, memory, NICs, and HBAs
- The server(s) software configuration
- Application requirements, such as redo logs for databases
- File system and volume manager settings
- HBA tunables
- Storage configuration, including LUN sizes, RAID type, and RAID cache sizes and settings
All of this may seem obvious, but if you’re going to give a storage vendor your benchmark, the more documentation that you provide them with the more likely the results will meet your requirements and the fewer questions you will have to answer. And if you’ve gone so far as to document the above, then creating the operational procedures for things such as remote mirroring, tape backup, and other operational requirements will not be difficult.
When using your own workload in a benchmark, there are several additional areas that need to be clearly understood and documented for the storage vendors, including:
- Server memory size and tunables settings – Many file systems use memory for the file system cache or the cache for the database based on system tunables or auto-configuration. If a vendor does not have the same amount of memory and use exactly the settings that you are using or the other vendors are using, that vendor's results could be skewed
- File system and volume manager settings – These settings will have a significant impact on the performance of your system, and because different settings could have a significant effect (positive or negative) on performance, they should be set the same for all the vendors
Emulating Your Workload
If you have a staff that can program in C, then writing the code to emulate your workload will not be that difficult. I believe that if you have done a good job with the emulation, then you’ll have a great deal more control of the benchmark in terms of scaling, and you’ll have a far better understanding of what your workload does to the actual hardware and software.
It also allows you to test the storage vendors’ hardware without the file system, as you can write/read directly to the raw devices. This allows a better understanding of the hardware that might otherwise be masked by the file system's effect on I/O performance.
The steps for developing an emulation are relatively simple:
- Use the system tools to get a system call listing of the application(s) doing the I/O. These tools are available from most OS vendors. For example, on Solaris it’s called truss, and on Linux it’s strace
- After collecting this data you’ll need to develop some statistical analysis of:
- Read and write ratio
- Read and write sizes
- File sizes
- Seeks and seek distance
- The amount of concurrent I/O
- Number of open files
- System call type (asynchronous or synchronous I/O)
- Develop a program that reads and writes with the formation developed in #2, writing and reading to/from the raw devices
This seems fairly difficult and can be, but once you have completed the process, you’ll be able to easily scale your workload up and down. Another advantage is that when the vendor receives the benchmark information, you will be receiving from them a true benchmarking of the actual storage hardware, not the file system and volume manager tunables.
Page 3: What About Software?
What About Software?
Even if you’re going to benchmark a file system or shared file system, much of what was recommended for the analysis of the hardware should be done for the software. One big difference is obviously you cannot write to the raw device if you are testing a file system.
Benchmarking a file system is likely to be the most difficult benchmarking task because there are so many variables, and doing it correctly is very time consuming, both for you and for the vendors.
Here are some items that must be characterized as part of the process for benchmarking file systems, building upon the characterizations already done for the hardware:
- File system size – current and future
- Total number of directories – current and future
- Total number of files – current and future
- Largest number of files per directory – current and future
For shared file systems, add:
- Amount of I/O from each client and the master machine
- Amount of metadata I/O from each client and the master machine
- Number of clients
- Types of clients
Along with this you have the hardware topology, including HBAs, switches, TCP/IP network for metadata, and possibly tapes, as most shared file systems have an HSM (Hierarchical Storage Management) system built into them.
Developing the scripts, codes, and methodology to do this type of benchmarking is hard work, but while hard on your end, for the storage vendor it will be virtually impossible, as most have limited relationships with shared file systems vendors, limited server resources, and limited staff that know shared file systems.
Often what this type of benchmark becomes is really a benchmarking of the benchmarker, not a benchmarking of the software and hardware. The vendor who often wins is not the one with the best hardware and software, but the vendor with the best benchmarks. Therefore, it’s important to give the storage vendors as much information and guidance as possible.
The most important part of a file system benchmark that is often forgotten is creating fragmentation as part of the benchmark. Most file system benchmarks create a new file system and run the benchmark tests with tools such as iozone and bonnie. Most of the time this is not really valid given that on a real file system users' files are created and removed many times, and multiple files are often written at the same time.
Some of the areas to look at are:
- How many applications are doing reads and writes at the same time?
- How many files will be created and deleted within the file system?
- How full will the file system be over time?
Each of these issues will have an impact on the benchmark that you create, regardless of which tool you use.
The process of developing a benchmark that mimics your operational environment — those are the key words. The process of determining what the characteristics of your environment is the first step. Workload characterization, though seemingly a difficult process, is not really that difficult when separating it into the various parts of application I/O, file system configuration, system and file system tunables, and hardware configuration requirements. It’s also important to keep in mind how the new system will be used as compared to how the old system was used.
Next time we will review the process of packaging, rules, analysis, and scoring.
See All Articles by Columnist Henry Newman