dht hashing based on basename only?

jklein at physi.uni-heidelberg.de (Jochen Klein) · Tue, 14 Aug 2012 09:44:17 +0200 (CEST)

Dear experts,

we started to test gluster (3.3.0) on a small cluster grown over some
time (~200 cores and ~100 TB in heterogeneous hardware). Our data
typically are organised in a tree of directories containing a set of
files with always the same filenames, s.a.
  dir1
  |- file1 (100 MB)
  |- file2 (1 MB)
  dir2
  |- file1 (100 MB)
  |- file2 (1 MB)
  dir3
  |- file1 (100 MB)
  |- file2 (1 MB)
  ...

To gain some experience with rebalancing we set up a volume with one
brick only, copied some data as described above, added a second brick
and started the rebalancing. The result was that all files of a given
name ended up on the same brick. In our case, this leads to a very
inhomogeneous distribution of data volume since the different types of
files have very different sizes.

Looking at the implementation in the dht translator and checking
calculated hashes it seems that only the basename is used for the hash
calculation of a given file. With all directories having the same
mappings for the hash intervals to bricks, this would explain our
observation if only this file hash is used. However, I also see hashes
calculated for directories but it's not clear to me for what they are
used?

Do I miss something here? Is this behaviour intended? Is there a
(supported) way to still distribute the files homogeneously to all
bricks? E.g. by using the full path for the hashing (which is actually
what I understood from the manual), or by shuffling the hash intervals
per directory?

There must be other people having many directories containing the same
set of files. Any recommendations on how to handle this?

Many thanks,
    Jochen