> > I was wondering, is there a way to change / parameter to pass to > > clusters DHT to change the distribution algorithm to only take into > > account filename and not the preceding filesystem path? > > i.e when a file is at: /mount/gluster/directory/filename.ext > > To only hash on “filename.ext” ? > > Currently DHT hashes the file name and not the entire path. > > See, callers of dht_hash_compute in source (pretty much > dht_layout_search) to which loc->name is passed, which is the file name > and not the entire path. While that is true, there are a couple of caveats. First, the hash is based on the file name (last path component) but the *distribution* for each directory (what we call a layout) is modified based on the directory GFID. This prevents the same file name in different directories always hashing to the same brick. (Personally I would have done this by "mixing in" the parent GFID to the hash calculation, but that alternative was ignored.) Secondly, there is a way to modify the hashing. If you set the "cluster.extra-hash-regex" option on a volume, that regular expression will be used to "pick apart" the file name into a part that's used for hashing and a part that's ignored. Consider the case of rsync, which for a file XXX will create a temporary file .XXX.123456 and rely on the semantics of rename(2) to move it into place only after it's fully written. The "rsync-hash-regex" is already set up to remove the leading "." and trailing ".123456" so that "XXX" is again the effective name for hashing/distribution purposes. This allows the later rename to be done on one brick every time, which improves performance significantly. With "extra-hash-regex" you can do the same thing for a second app, without affecting the rsync behavior. _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel