Advanced CRUSH map rules

greg@xxxxxxxxxxx (Gregory Farnum) · Wed, 14 May 2014 11:12:22 -0700

On Wed, May 14, 2014 at 10:52 AM, Pavel V. Kaygorodov <pasha at inasan.ru> wrote:
> Hi!
>
>> CRUSH can do this. You'd have two choose <bucket>...emit sequences;
>> the first of which would descend down to a host and then choose n-1
>> devices within the host; the second would descend once. I think
>> something like this should work:
>>
>> step take default
>> step choose firstn 1 datacenter
>> step chooseleaf firstn -1 room
>> step emit
>> step chooseleaf firstn 1 datacenter
>> step emit
>>
>
> May be I'm wrong, but this will not guarantee choice of different datacenters for n-1 and remaining replica.
> I have experimented with rules like this, trying to put one replica to "main host" and other replicas to some other hosts.
> Some OSDs was referenced two times in some of generated pg's.

Argh, I forgot about this, but you're right. :( So you can construct
these sorts of systems manually (by having different "step take...step
emit" blocks, but CRUSH won't do it for you in a generic way.

However, for *most* situations that people are interested in, you can
pull various tricks to accomplish what you're actually after. (I
haven't done this one myself, but I'm told others have.) For instance,
if you just want 1 copy segregated from the others, you can do this:

step take default
step choose firstn 2 datacenter
step chooseleaf firstn -1 room
step emit

That will generate an ordered list of 2(n-1) OSDs, but since you only
want n, you'll take n-1 from the first datacenter and only 1 from the
second. :) You can extend this to n-2, etc.

If you have the pools associated with particular datacenters, you can
set up rules which place a certain number of copies in the primary
datacenter, and then use parallel crush maps to choose one of the
other datacenters for a given number of replica copies. (That is, you
can have multiple root buckets; one for each datacenter that includes
everybody BUT the datacenter it is associated with.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com