crushmap for datacenters

vadikgo@xxxxxxxxx (Vladislav Gorbunov) · Tue, 20 May 2014 13:54:17 +1200



I create the new complex rule:
rule datacenter_rep2 {
ruleset 2
type replicated
min_size 1
max_size 10
step take default
        step choose firstn 2 type datacenter
        step chooseleaf firstn -1 type host
step emit
}
assign to pools, and now cluster work as I expect.


2014-05-20 11:59 GMT+12:00 Vladislav Gorbunov <vadikgo at gmail.com>:

> Hi!
>
> Can you help me to understand why crushmap with
> step chooseleaf firstn 0 type host
> can't work with hosts in data centers?
>
> If I have the osd tree:
> # id weight type name up/down reweight
> -1 0.12 root default
> -3 0.03 host tceph2
> 1 0.03 osd.1 up 1
> -4 0.03 host tceph3
> 2 0.03 osd.2 up 1
> -2 0.03 host tceph1
> 0 0.03 osd.0 up 1
> -5 0.03 host tceph4
> 3 0.03 osd.3 up 1
> -7 0 datacenter dc1
> -6 0 datacenter dc2
>
> and default crush map rule
>
>     { "rule_id": 0,
>       "rule_name": "data",
>       "ruleset": 0,
>       "type": 1,
>       "min_size": 1,
>       "max_size": 10,
>       "steps": [
>             { "op": "take",
>               "item": -1},
>             { "op": "chooseleaf_firstn",
>               "num": 0,
>               "type": "host"},
>             { "op": "emit"}]},
>
>
> used by pools:
> pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
> pg_num 64 pgp_num 64 last_change 1176 owner 0 crash_replay_interval 45
> pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1190 owner 0
> pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
> pg_num 64 pgp_num 64 last_change 1182 owner 0
>
> When one of the osd is down, cluster successfully rebalance to OK state:
>
> # id weight type name up/down reweight
> -1 0.12 root default
> -3 0.03 host tceph2
> 1 0.03 osd.1 down 0
> -4 0.03 host tceph3
> 2 0.03 osd.2 up 1
> -2 0.03 host tceph1
> 0 0.03 osd.0 up 1
> -5 0.03 host tceph4
> 3 0.03 osd.3 up 1
> -7 0 datacenter dc1
> -6 0 datacenter dc2
>
> ceph -s
>   cluster 6bdb23fb-adac-4113-8c75-e6bd245fcfd6
>    health HEALTH_OK
>    monmap e1: 1 mons at {tceph1=10.166.10.95:6789/0}, election epoch 1,
> quorum 0 tceph1
>    osdmap e1207: 4 osds: 3 up, 3 in
>     pgmap v4114539: 480 pgs: 480 active+clean; 2628 MB data, 5840 MB used,
> 89341 MB / 95182 MB avail
>    mdsmap e1: 0/0/1 up
>
> But if hosts moved to datacenters  like in this map:
> # id weight type name up/down reweight
> -1 0.12 root default
> -7 0.06 datacenter dc1
> -4 0.03 host tceph3
> 2 0.03 osd.2 up 1
> -5 0.03 host tceph4
> 3 0.03 osd.3 up 1
> -6 0.06 datacenter dc2
> -2 0.03 host tceph1
> 0 0.03 osd.0 down 0
> -3 0.03 host tceph2
> 1 0.03 osd.1 up 1
>
> cluster can't reach OK state when one host is out of cluster:
>
>   cluster 6bdb23fb-adac-4113-8c75-e6bd245fcfd6
>    health HEALTH_WARN 6 pgs incomplete; 6 pgs stuck inactive; 6 pgs stuck
> unclean
>    monmap e1: 1 mons at {tceph1=10.166.10.95:6789/0}, election epoch 1,
> quorum 0 tceph1
>    osdmap e1256: 4 osds: 3 up, 3 in
>     pgmap v4114707: 480 pgs: 474 active+clean, 6 incomplete; 2516 MB data,
> 5606 MB used, 89575 MB / 95182 MB avail
>    mdsmap e1: 0/0/1 up
>
> if downed host is up and return to the cluster then health is OK. If
> downed osd manually reweighed to 0 then cluster health is OK.
>
> Crushmap with
> step chooseleaf firstn 0 type datacenter
> have the same issue.
>
>     { "rule_id": 3,
>       "rule_name": "datacenter_rule",
>       "ruleset": 3,
>       "type": 1,
>       "min_size": 1,
>       "max_size": 10,
>       "steps": [
>             { "op": "take",
>               "item": -8},
>             { "op": "chooseleaf_firstn",
>               "num": 0,
>               "type": "datacenter"},
>             { "op": "emit"}]},
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140520/6c895e92/attachment.htm>