On Wed, May 14, 2014 at 10:52 AM, Pavel V. Kaygorodov <pasha at inasan.ru> wrote: > Hi! > >> CRUSH can do this. You'd have two choose <bucket>...emit sequences; >> the first of which would descend down to a host and then choose n-1 >> devices within the host; the second would descend once. I think >> something like this should work: >> >> step take default >> step choose firstn 1 datacenter >> step chooseleaf firstn -1 room >> step emit >> step chooseleaf firstn 1 datacenter >> step emit >> > > May be I'm wrong, but this will not guarantee choice of different datacenters for n-1 and remaining replica. > I have experimented with rules like this, trying to put one replica to "main host" and other replicas to some other hosts. > Some OSDs was referenced two times in some of generated pg's. Argh, I forgot about this, but you're right. :( So you can construct these sorts of systems manually (by having different "step take...step emit" blocks, but CRUSH won't do it for you in a generic way. However, for *most* situations that people are interested in, you can pull various tricks to accomplish what you're actually after. (I haven't done this one myself, but I'm told others have.) For instance, if you just want 1 copy segregated from the others, you can do this: step take default step choose firstn 2 datacenter step chooseleaf firstn -1 room step emit That will generate an ordered list of 2(n-1) OSDs, but since you only want n, you'll take n-1 from the first datacenter and only 1 from the second. :) You can extend this to n-2, etc. If you have the pools associated with particular datacenters, you can set up rules which place a certain number of copies in the primary datacenter, and then use parallel crush maps to choose one of the other datacenters for a given number of replica copies. (That is, you can have multiple root buckets; one for each datacenter that includes everybody BUT the datacenter it is associated with.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com