Re: CRUSH rule for 3 replicas across 2 hosts

Colin Corr <colin@xxxxxxxxxxxxx> · Tue, 21 Apr 2015 13:40:12 -0700

On 04/21/2015 09:08 AM, Robert LeBlanc wrote:
> Your logic isn't quite right and from what I understand, this is how it works:
> 
> step choose firstn 2 type rack       # Choose two racks from the CRUSH map (my CRUSH only has two, so select both of them)
> step chooseleaf firstn 2 type host  # From the set chosen previously (two racks), select a leaf (osd) from from 2 hosts of each rack (each of the set returned previously).
> 
> If you have size 3, it will pick two OSDs from one rack and one from the second (remember that the first rack in placement will sometimes be 'A' and sometimes 'B' so the placement won't be totally unbalanced).

OK, that explains why I was thinking you would end up with 4 replicas, 2 in each rack. For some reason I was superimposing some imaginary step before the rack, but your first step is root default.

> Where the min_size and max_size comes in could be something like this (this is somewhat exaggerated):
> 
> Lets say that you want the minimal possible latency and highest bandwidth and are OK with losing data (swap partitions or something). You create a pool with size 1 and a rule like this:
> 
> rule replicated_swap {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 1
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
> 
> Then you have a pool you want to put on some hosts that have RAID5 prtected OSDs, so you don't need as many replications because RAID will protect from disk failures:
> 
> rule replicated_radi5 {
>         ruleset 1
>         type replicated
>         min_size 2
>         max_size 2
>         step take raid5
>         step chooseleaf firstn 0 type host
>         step emit
> }
> 
> Then you have a pool that you want "default" protection for 3-4 copies:
> 
> rule replicated_default {
>         ruleset 2
>         type replicated
>         min_size 3
>         max_size 4
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
> 
> Then you have a pool that you absolutely can't lose data on, so you have lots of copies and want it spread throughout the data center:
> 
> rule replicated_paranoid {
>         ruleset 3
>         type replicated
>         min_size 5
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type rack
>         step emit
> }
> 
> You then specify the rule to use for each pool. Again, the min and max size is a selector for the rule. If the actual pool size is outside of the min and max, then the rule should not run (I don't know if it actually does this or is just a reminder for the human to know what sizes the rule was intentionally written for).

Thank your posting these example scenarios. This is definitely helpful.
Regards,
Colin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com