Re: Question about CRUSH object placement

Sage Weil <sage@xxxxxxxxxxx> · Mon, 20 Jan 2014 10:00:54 -0800 (PST)

On Mon, 20 Jan 2014, Arnulf Heimsbakk wrote:
> Hi,
> 
> I'm trying to understand the CRUSH algorithm and how it distribute data.
> Let's say I simplify a small datacenter setup and map it up
> hierarchically in the crush map as show below.
> 
>            root          datacenter
>           /    \
>          /      \
>         /        \
>        a          b      room
>      / | \      / | \
>     a1 a2 a3   b1 b2 b3  rack
>     |  |  |    |  |  |
>     h1 h2 h3   h4 h5 h6  host
> 
> I want 4 copies of all data in my pool, configured on pool level. 2
> copies in each room. And I want to be sure not 2 copies resides in the
> same rack when there is no HW failures.
> 
> Will the chooseleaf rule below ensure this placement?
> 
> 	step take root
> 	step chooseleaf firstn 0 type room
>         step emit

This won't ensure the 2 copies in each room are in different racks.

> Or do I have to specify this more, like
> 
> 	step take root
> 	step choose firstn 2 type room
> 	step chooseleaf firstn 2 type rack
> 	step emit

I think this is what you want.  The thing it won't do is decide to put 4 
replicas in room b when room a goes down completely... but at that scale, 
that is generally not what you want anyway.

> Or even more, like?
> 
> 	step take a
> 	step choose firstn 2 type rack
> 	step chooseleaf firstn 1 type host
> 	step emit
> 	step take b
> 	step choose firstn 2 type rack
> 	step chooseleaf firstn 1 type host
> 	step emit
> 
> Is there difference in failure behaviour in the different configurations?

This would work too, but assumes you only have 2 rooms, and that you 
always want the primary copy to be in room a (which means the reads go 
there).  The previous rule with spread the primary responsibility across 
both rooms.

sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com