In order to better support our growing community we've set up a new more powerful forum.
The new forum is at: http://community.covecube.com
The new forum is running IP.Board and will be our primary forum from now on.
This forum is being retired, but will remain online indefinitely in order to preserve its contents. This forum is now read only.
Thank you,
DrivePool File Distribution Algorithm
Here's an interesting question.
What type of file distribution algorithm do you guys think DrivePool should use across the pooled drives?
Remember this is not a RAID like solution with stripes. It's actually writing real files to NTFS.
Right now, it's very dumb and just picks the drive with the most space to put your file on (or more than one in case of duplication). This is really easy to code and very fast. But is this really the best thing to do?
I can think of a few alternatives, what do you think?
Comments
1. A Landing Area "Master Drive" with a service to move / create symbolic links
2. File System Driver / File System Filter Driver / MiniFilter Driver
2 Has the benefit of ensuring (as long as the pool drives are not altered) that all operation go through you code where as with the service /symbolic link method (even if using some sort of FileSystemWatcher) if the service crashes windows will gladly keep shareing and allowing updating of data in the symbolic links, using a driver removes the need for a Master Drive (although a virtual disk drive may be needed to instantiate the file system driver) and will probably allow for higher performance in parallel writes
as far as the question above, keeping track of which drives are currently being written to and move them to the back of the queue so it will try to write to any non-writing drives first, then return to the writing drive if no other drives can handle it (full, etc)
this of course assumes that you can redirect writes as they are happening which I have not found a way to do without a driver (although I really didn't look all that long)
Unfortunately, a side effect of this algorithm is to "use up" one drive at a time until it's full, then move on to the next. If you are storing data other than multimedia this is fine as long as folder duplication is enabled.
What would be my ideal is that files stored to a specific directory be saved to the same drive as long as the drive's "minimum free space threshold" is not reached, but that any new directories be created in the drive with the most free space available. This would tend to equalize the drive storage and yet retain the advantage of keeping the file contents of a directory together.
As an alternative, you could use a file migrator to "reassemble" a directory's file contents after the fact. What I wouldn't like to see is any master "landing area", as the disadvantages of this algorithm have already been proved to be undesirable.
-MWS-
You have to be careful on how you define "most free space" when designing a fast, simple pooling algorithm. For instance, if you define it in absolute terms (xxx bytes free), consider the following drive pool: 2 TB, 1.5 TB, 500 GB, 320 GB.
If folder duplication is enabled for all folders, the first terrabyte of data will be written to the 2 TB drive with the files duplicated on the 1.5 TB drive. This leaves the 2TB with 1TB free and the 1.5 TB with 500 GB free.
The next 360 GB will be written to the 2 TB drive with the dup files split between the 1.5 TB and the 500 GB drive. This leaves the 2TB with 640 GB free and both the 1.5 TB and 500 GB drives with 320 GB free.
At this point, it takes another 486 GB before all of the drives have an equal amount of free space left (154 GB each) and all of the data would have been written to the 2 TB drive (with folder dup enabled globally).
See the problem with unequal drive sizes in the pool? With folder duplication, the vast majority of your data will be written only to the largest drive. Since the two main features of the drive pooling implementation are folder duplication and the ability to use any size drive, using "most free space in bytes" isn't all that desirable. You have to compute the free space using some sort of relative ratio in order to distribute the data across the drives equally.
The only time "most free space in bytes" works well is if all of the pool drives are near equal in size, and even then you end up with most of the initial data being written to one or two drives.
-MWS-
First and foremost, if this is indeed to be a HOME Server solution, you must make it failure proof in that the thing almost runs itself. Of course, you can't second guess Microsoft, but at the very least, you should be able to control your own product.
I have friends who aren't even able to run WHS v1, much less WHS 2011 RC.
Maybe when storage is getting pretty full, a message pops up to tell them to add another drive to their server. Of course, they'll call me, but if and when that message pops up, they're given explicit instructions on what to do next, they may be able to follow through.
<Go out and get a Western Digital 1 TB drive, bring it home, unpack it, turn off and unplug your computer, open your computer case (most will balk here), blah, blah, etc. Or, ... get an external 1 TB drive, turn off your computer, plug in the drive, blah, blah, etc.>
The main thing is simplicity. Purely and simply simple! That's the only way WHS is going to fly, Orville.