Per-directory brick preference?

jdarcy at redhat.com (Jeff Darcy) · Mon, 20 Jun 2011 14:01:07 -0400

On 06/20/2011 01:26 PM, Philip Poten wrote:
> I operate a distributed replicated (1:2) setup that looks like this:
> 
> server1:bigdisk,server1:smalldisk,server2:bigdisk,server2:smalldisk
> 
> replica sets are bigdisk-bigdisk and smalldisk-smalldisk.

Are you sure? I'm not trying to be snarky here; if you specified the
bricks in that order to "volume create" then the combination of bricks
into replica sets might not be what you expect. Instead, you'd have both
disks on server1 combined into one replica set and both disks on server2
combined into another, and that wouldn't protect your data against
server failure.

> This setup will be extended by another set of four bricks (same
> setup) within the next few days, and I could make those into another
> volume entirely, but I'd prefer not to, leaving me with more disks
> and hosts to distribute the data.
> 
> Now, I should've seen this earlier and used different volumes for
> this, but I'd like to seperate two main directories between bigdisk
> and smalldisk replica sets.
> 
> Is there a best practice for doing this? I thought I could manually
> move all the files pertaining to those directories between the disks,
> but since this is a production system, I'd like to know if gluster
> can handle this, and if not, if there's a better way to achieve this
> - without seperating the data into two volumes.

There is kind of a way to do this, but it's distinctly non-kosher so I
have to slap a big planet-sized "caveat emptor" on it.  As is explained
in an article I wrote a while ago
(http://cloudfs.org/2011/04/glusterfs-extended-attributes/) the "layout
map" that controls placement of files within a directory is actually
stored in an extended attribute on each copy of that directory (one copy
per brick).  Therefore, by manipulating these extended attributes from
the command line you could affect placement of files in any number of
ways including the way you mention.  In that case, you would set the
xattrs on each server to "claim" hash ranges (see the article) as follows:

	bigdisk/bigdir     - 0x00000000 to 0xfffffffe
	bigdisk/smalldir   - 0xffffffff to 0xffffffff
	smalldisk/bigdir   - 0xffffffff to 0xffffffff
	smalldisk/smalldir - 0x00000000 to 0xfffffffe

This would cause practically all files in bigdir to be placed in
server*:bigdisk/bigdir, and practically all files in smalldir to be
placed in server*:smalldisk/smalldir.  I say "practically all" instead
of "all" because it's actually not possible to assign a zero range to a
brick, so there's a one four in four billion chance that a file in
either directory will get assigned to the "wrong" place.  After doing
this, "gluster volume rebalance xxx migrate-data start" should cause
files to be migrated to the "correct" locations, but with two major caveats.

(1) A subsequent "gluster volume rebalance xxx fix-layout start" will
undo your careful xattr-twiddling, so that files will no longer be
placed the way you intended.

(2) These values are not inherited by subdirectories (that would be a
very good subject for an enhancement request) so the careful placement
would only apply to the top-level directories.

For these reasons, I wouldn't necessarily recommend doing things this
way.  It's *possible*, but in the long run you'd probably be better off
creating separate volumes.