DrivePool BETA build 3874 is out:
This build starts to feature some of the performance improvements that will be part of M4.
A bit of history
When M3 was released, I said that M4 will be treated differently as far as the development cycle is concerned. Previously the code was branched at a major beta release (M2, M3), and two different versions of DrivePool were being worked on at the same time. The public build, for bug fixes, and an internal build for the next milestone. M4 was not going to be treated the same way. What you'll see with M4 is most of the features being rolled into the M3 code and releaced piece by piece. Once everything is in, we'll call it M4.
The first of these builds arrived today (3874). There was a significant amount of work done in the kernel to lay the foundation for M4 and now it's ready to be tested.
Visually, what you'll see is a new DrivePool Settings window accessible from the Dashboard tab. In it there will be options to enable the various performance enhancing aspects of the new kernel code. All the options go into effect immediately after clicking OK, except the first one. For the threading option you'll need to restart the Server.
- Direct file I/O
This is basically a made up term to represent the new layer of code written in the kernel in order to facilitate extremely low overhead I/O. When direct file I/O is utilized, the I/O request will be managed 100% in the OS kernel. In the case of reading and writing, DrivePool will communicate with the NTFS file system to facilitate the completion of the requested operation as quickly as possible.
Direct I/O will consists of various operations, but today DrivePool supports reading and writing files.
- Direct file I/O - Writing
When writing duplicated files, DrivePool writes directly to the 2 NTFS files, asynchronously. In other words, for every write it issues 2 write requests to NTFS, in parallel. When both requests complete, the original request is completed.
- Direct file I/O - Reading
Direct I/O reading can work in 2 ways, striped and non-striped.
Striped reads are a technique, where for a duplicated file, DrivePool will intelligently balance the reads across the 2 disks holding the file being read in order to maximize performance. Theoretically, you can double your read speed, but the true impact of this needs to be benchmarked.
Non-striped reading will force DrivePool to read from the first disk that it finds. This is the way it has worked in the past.
- Fast I/O
Fast I/O is a technique that kernel drivers use to speed up I/O even beyond what Direct I/O can do. In Fast I/O the I/O request take a super fast code path, so that it can be fulfilled immediately. Typically this means reading from the cache, or writing to the cache for later delivery to the disk.
DrivePool 3874 supports both reading and writing with Fast I/O. Fast I/O writing can be turned off individually.
More to come
The directory listing optimization didn't make it into this build, turns out it's more complex than originally thought. But it's doable and you'll see it soon enough. The Directory listing optimization should in theory let you list many thousands of files, very quickly. It will utilize Direct I/O to query all the NTFS volumes at the same time, in parallel, and combine the result for the caller.
Theoretically, it should be faster than a standard NTFS directory listing since we will be querying all the disks in parallel.
And there can be more done in the future in other areas.
How to test
3874 comes with the new file system driver, so you'll have to reboot after the upgrade. But, all of the optimization discussed above are off by default. This was done so as not to cause pain for people who are not expecting the potential instability introduced by the all new kernel code.
You can turn each option on / off in real-time, even during a transfer to see the effect. I expect the Direct I/O will make the most difference, while read striping is mostly a trial run to see how it goes. Fast I/O while it sounds cool might not be as important in today's world of quad cores and i7s.
One more thing... Logging is enabled by default in the about box. If anyone wants to conduct a benchmark, make sure to uncheck that as it writes extensive detail on every single I/O request (way beyond what you can visually see in the text log files). We're talking about 50 MB / min worth of rotating binary logs.