Choosing a File System or Volume Manager
Though file systems have only changed in a methodical evolutionary way over the last 35 years, you have a number of choices that require examination before you can pick the best file system for your application environment. Server systems and software have changed far more radically than file systems. Here are ten areas that you need to consider before picking and implementing a file system.
- File System size requirements
- The underlying RAID/disk topology
- The number of files
- The distribution of file sizes
- The bandwidth and/or IOPS requirements
- The applications requirements
- Shared file system with homogenous access
- Shared file system with heterogeneous access
- Recovery requirements
- Plans for backup/HSM
By examining each of these points you will be able to narrow the number of file system choices you have for each of the server systems under consideration. In some cases the answers to a single question with a single system mean you have only one choice.
File System Size Requirements
In today's world many file systems and volume managers have an internal 2 terabyte file system limit. This limit is currently being changed in a number of file systems and volume managers, but the limit still exists. In many cases a vendor first changes the limits to support over 2 terabytes (TB), but then the file system and volume manager performance suffers as they often use the same techniques for allocation and space representation that were used with the sub 2TB version. The problem is that these techniques do not always scale.
The 2TB limit is also imposed by the SCSI command set, as the current limit for addressing in a single LUN is 2TB.This will likely change over the next few years.
The Underlying RAID/Disk Topology
Knowing the physical layout of the storage is very important in choosing a file system. Some file systems do well with large caches given the application load. Some file systems allow the separation of file system meta data (inodes and superblocks) and logs, and having these separated significantly improve the performance. Knowing what types of device, how to use them based on the file system features and your requirements is part of the whole planning process. You could have the best file system available for your requirements but you need to have the underlying hardware to take advantage of the file system features.
The Number of Files
Knowing how many files and the number of files per directory in some is cases becomes the overriding factor in a choice of file systems. I have some clients that want to have 100,000 files per directory (I do not get good answers as to why, but they want it anyway). Many file systems have extreme performance difficulties with far fewer files per directory (even as low as 10000). Taking it further, I have another client that wants to have 100,000,000 files in a file system. File system features such as data metadata separation become critical with these types of requirements. Personally, right now, I don't believe that any file system can really work efficiently with100,000,000 files in the file system, but who knows - I don't think anyone has performed tests to such limits.
The Distribution of File Sizes
File size distribution is important because of the underlying file system allocation algorithms. Many file systems cannot allocate large amounts of contiguous space. In some file systems the internal allocation cannot be larger than 8K, so if your environment has many large files this could be a problem. On the other hand, if the environment has mostly small files, it will work just fine. As an example, in some databases the file sizes are 2 GB. Having the ability to allocate in 2GB chunks will reduce the overhead within the file system.
The Bandwidth and/or IOPs Requirements
Some applications such as video require high bandwidth I/O, while other applications such as OLTP require high IOPs. Based on each of these requirements some file systems have advantages and disadvantages. Some file systems support automatic direct I/O (I/O that moves from the user space to the device without going through any system cache). This allows high performance I/O as data is not copied twice in memory (once from the user to the system cache and then to the device). This dramatically reduces the amount of CPU required to do the I/O. Other file systems support tunable cache sizes for databases, for reading and writing, and tunable readahead values.
The Requirements of the Application
Internal features within file systems, such as being able to preallocate space, are important for both databases and in real-time streaming applications such as video. Preallocation allows an application to ask the file system in advance for contiguous space.
Another impending requirement for database and streaming applications is multithread writes. Some file systems allow multiple writes to be outstanding to the same open file descriptor. This is either accomplished using a threaded application or POSIX asynchronous I/O. Oracle is a good example of an application that wants to be able to have multiple outstanding writes. File systems often prevent an application(s) from opening the same file and allowing multiple outstanding writes. If an application such as Oracle knows what it is doing this is not a problem, but some file systems prevent this function due mainly to the dangerous nature of the activity. With some file systems and/or volume managers you need to buy a special version that allows this functionality.
Shared File System with Homogenous Access
A number of file systems, generally from server vendors, support shared data access. For these file systems one system is designated as the master of the file system metadata, and the other systems are clients to the master. In general file system metadata moves over a TCP/IP network and file system data moves over fibre channel. For these types of file systems, small block writes are usually significantly longer on the clients than the time required on the server. A number of new features and tunables are available in this area, but are best used by an expert.
Shared File System With Heterogeneous Access
Shared heterogeneous access is just plain hard work in this day and age. Fewer companies support this type of product because its complexities, but it is the holy grail that everyone wants and claims they need yesterday. A number of issues exist like data ENDIAN. What if one application writes the data from a big ENDIAN machine and another applications on a little ENDIAN machine tries to read it? With NFS the file's bits are flipped, but in this case the application must know what ENDIAN it was written with and provide for the bit flipping. Some vendors write applications in this way while others, especially home grown applications, are not generally written to accommodate this.
Understanding the requirements for recovery time is very important when considering a file system. Often this is accomplished by using a file system which supports logging. That way, after a crash, the only thing that needs to be checked is the log. Logging is one method of recovering quickly after a crash and has become, in some cases, a requirement.
Equally important is recovery after a major disaster. I have worked on sites that have suffered a power failure because of a storm, and then had the UPS hit by lightning which resulted in critical file system metadata devices getting 'fried.' With some work they were back up and running within 6 hours, but only because the file system they used supported metadata backup.
Plans for backup/HSM
Last but not least, you also need to consider what you need to do for backup and/or use of HSM. Some file systems have internal features that support backing up just the metadata and others have their own internal backup for data and metadata. If you want to have a 30TB file system, think of the amount of time it will take to backup. Even if you had 10 LTO drives running at a sustained rate of 18 MB/sec, not considering load and position time, it would take about 46 hours to perform a complete backup. This is a good example of why I believe that HSM will replace backup for large file systems, as very few sites have a 46-hour window to perform a backup. HSM allows a backup copy of the data on secondary media (tape, disk or in some cases multiple copies on tape disk and off-site). Knowing what your file system support for both backup and HSM based on your operation is of critical importance.
As you can see choosing a file system is not easy and I think it will become harder over the next few years. From working with clients, I am seeing a growing trend of sites moving from a server centric data center, to storage or data centric data center, and the file system controlling the data. This is happening because storage performance has not kept up with server performance, storage density is growing faster than storage performance and use of Fibre Channel allows storage to be farther way than the old SCSI systems allowed.
The file system is at the center of our storage universe. To make informed decisions, you need to know what is available, and what to use.
See All Articles by Columnist Henry Newman