Advanced CRUSH map rules

fabrizio.ventola@xxxxxxxx (Fabrizio G. Ventola) · Fri, 16 May 2014 16:01:29 +0200

Ok, thanks for the suggestions, I will try to achieve this in the next
days and I will share my experience with you.

Cheers,
Fabrizio

On 14 May 2014 20:12, Gregory Farnum <greg at inktank.com> wrote:
> On Wed, May 14, 2014 at 10:52 AM, Pavel V. Kaygorodov <pasha at inasan.ru> wrote:
>> Hi!
>>
>>> CRUSH can do this. You'd have two choose <bucket>...emit sequences;
>>> the first of which would descend down to a host and then choose n-1
>>> devices within the host; the second would descend once. I think
>>> something like this should work:
>>>
>>> step take default
>>> step choose firstn 1 datacenter
>>> step chooseleaf firstn -1 room
>>> step emit
>>> step chooseleaf firstn 1 datacenter
>>> step emit
>>>
>>
>> May be I'm wrong, but this will not guarantee choice of different datacenters for n-1 and remaining replica.
>> I have experimented with rules like this, trying to put one replica to "main host" and other replicas to some other hosts.
>> Some OSDs was referenced two times in some of generated pg's.
>
> Argh, I forgot about this, but you're right. :( So you can construct
> these sorts of systems manually (by having different "step take...step
> emit" blocks, but CRUSH won't do it for you in a generic way.
>
> However, for *most* situations that people are interested in, you can
> pull various tricks to accomplish what you're actually after. (I
> haven't done this one myself, but I'm told others have.) For instance,
> if you just want 1 copy segregated from the others, you can do this:
>
> step take default
> step choose firstn 2 datacenter
> step chooseleaf firstn -1 room
> step emit
>
> That will generate an ordered list of 2(n-1) OSDs, but since you only
> want n, you'll take n-1 from the first datacenter and only 1 from the
> second. :) You can extend this to n-2, etc.
>
> If you have the pools associated with particular datacenters, you can
> set up rules which place a certain number of copies in the primary
> datacenter, and then use parallel crush maps to choose one of the
> other datacenters for a given number of replica copies. (That is, you
> can have multiple root buckets; one for each datacenter that includes
> everybody BUT the datacenter it is associated with.)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com