Re: Question about CRUSH object placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sage,

I have a similar question, I need 2 replicas (one on each rack) and I would like to know whether the following rule always save primary on rack1?
rule data {
        ruleset 0
        type replicated
        min_size 2
        max_size 2
        step take rack1
        step chooseleaf firstn 1 type host
        step emit
        step take rack2
        step chooseleaf firstn 1 type host
        step emit
}
If so, I was wondering if you could tell the following rule will do the same thing by spreading primary and replica across rack1 and rack2?
rule data {
        ruleset 0
        type replicated
        min_size 2
        max_size 2
        step take row1
step choose firstn 2 type rack
  step chooseleaf firstn 1 type host step emit }

Thanks in advance
Sherry



On Tuesday, January 21, 2014 7:00 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
On Mon, 20 Jan 2014, Arnulf Heimsbakk wrote:
> Hi,
>
> I'm trying to understand the CRUSH algorithm and how it distribute data.
> Let's say I simplify a small datacenter setup and map it up
> hierarchically in the crush map as show below.
>
>            root          datacenter
>          /    \
>          /      \
>        /        \
>        a          b      room
>      / | \      / | \
>    a1 a2 a3  b1 b2 b3  rack
>    |  |  |    |  |  |
>    h1 h2 h3  h4 h5 h6  host
>
> I want 4 copies of all data in my pool, configured on pool level. 2
> copies in each room. And I want to be sure not 2 copies resides in the
> same rack when there is no HW failures.
>
> Will the chooseleaf rule below ensure this placement?
>
>     step take root
>     step chooseleaf firstn 0 type room
>        step emit

This won't ensure the 2 copies in each room are in different racks.

> Or do I have to specify this more, like
>
>     step take root
>     step choose firstn 2 type room
>     step chooseleaf firstn 2 type rack
>     step emit

I think this is what you want.  The thing it won't do is decide to put 4
replicas in room b when room a goes down completely... but at that scale,
that is generally not what you want anyway.

> Or even more, like?
>
>     step take a
>     step choose firstn 2 type rack
>     step chooseleaf firstn 1 type host
>     step emit
>     step take b
>     step choose firstn 2 type rack
>     step chooseleaf firstn 1 type host
>     step emit
>
> Is there difference in failure behaviour in the different configurations?

This would work too, but assumes you only have 2 rooms, and that you
always want the primary copy to be in room a (which means the reads go
there).  The previous rule with spread the primary responsibility across
both rooms.

sage

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux