I've seen a number of questions here on the forum and in my support tickets about how file duplication works in DrivePool.
I'd like to make a post here that describes the inner workings of file duplication, as it is in the latest versions of DrivePool (1.3, 2.0), which has evolved some since the original implementation.
For those of you who have been with DrivePool from the beginning, starting some years ago, you may recall that the original technique for determining whether a folder was duplicated or not was the dot suffix. It was literally as simple as appending a .1 or a .2 to the end of a folder name. This was very simple, reliable and worked great.
The problem was that the Windows Home Server 2011 enforces strict naming requirements for any shared folder, so appending strange digits to folder names was not very compatible with it.
Eventually DrivePool moved to using alternate streams in order to determine the duplication count for folders. You can think of alternate streams as hidden files that are attached to folders (or other files).
Today, we actually write the duplication count into a special stream that we call a "directory tag". You can see these special "tags" by issuing a dir /r command on a duplicated folder.
In particular, the stream that contains the duplication count is called DuplicationCount.Tag.CoveFs.
So what does this stream contain?
Well, at first, all it had was a number, either 1 or 2. That simply said how many copies of files should be maintained under this directory (much as the original system did).
But as we moved forward, we had the need to develop a system that would allow for arbitrary duplication counts on any sub-folder. It would also have to be done efficiently in memory, without querying the disk.
This is the system that is implemented today in DrivePool 1.X and 2.X.
The system works following these rules:
- Any folder can have an explicit duplication count, or specify that it is inheriting the duplication count of the parent folder. Explicit duplication counts are specified using a number >= 1, and inheritance is indicated using an I.
- In addition, each folder that specifies a duplication count, can indicate whether its sub-folders specify additional duplication properties or simply automatically inherit the current folder's duplication count (whether inherited from the parent or explicitly specified). We call this a Multiple flag, meaning that the sub-folders of this folder exhibit multiple duplication levels.
- The root folder of a pool implicitly inherits the duplication count of 1, from an imaginary parent folder.
That's it. DrivePool just uses these rules to determine the duplication level of every file on the pool.
The rules allow us to build an efficient caching system to determine the duplication count of every file on the pool, without explicitly reading the duplication count from the disk, on each file open.
Now I know the rules are confusing the first time that you read them, so here's an example:
\ - MI - Multiple, Inhetit
(this folder inherits the duplication count from the parent (1) and specifies that its sub-folders have additional duplication counts)
\ServerFolders\ - MI Multiple, Inherit
\ServerFolders\Documents - 1 (one copy of every file under this folder)
\ServerFolders\Videos - 2 (two copies of every file under this folder)
\ServerFolders\... - (other folder would simply inherit the duplication count from the parent)