crushmap for datacenters

vadikgo@xxxxxxxxx (Vladislav Gorbunov) · Tue, 20 May 2014 11:59:24 +1200

Hi!

Can you help me to understand why crushmap with
step chooseleaf firstn 0 type host
can't work with hosts in data centers?

If I have the osd tree:
# id weight type name up/down reweight
-1 0.12 root default
-3 0.03 host tceph2
1 0.03 osd.1 up 1
-4 0.03 host tceph3
2 0.03 osd.2 up 1
-2 0.03 host tceph1
0 0.03 osd.0 up 1
-5 0.03 host tceph4
3 0.03 osd.3 up 1
-7 0 datacenter dc1
-6 0 datacenter dc2

and default crush map rule

    { "rule_id": 0,
      "rule_name": "data",
      "ruleset": 0,
      "type": 1,
      "min_size": 1,
      "max_size": 10,
      "steps": [
            { "op": "take",
              "item": -1},
            { "op": "chooseleaf_firstn",
              "num": 0,
              "type": "host"},
            { "op": "emit"}]},

used by pools:
pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1176 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1190 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1182 owner 0

When one of the osd is down, cluster successfully rebalance to OK state:

# id weight type name up/down reweight
-1 0.12 root default
-3 0.03 host tceph2
1 0.03 osd.1 down 0
-4 0.03 host tceph3
2 0.03 osd.2 up 1
-2 0.03 host tceph1
0 0.03 osd.0 up 1
-5 0.03 host tceph4
3 0.03 osd.3 up 1
-7 0 datacenter dc1
-6 0 datacenter dc2

ceph -s
  cluster 6bdb23fb-adac-4113-8c75-e6bd245fcfd6
   health HEALTH_OK
   monmap e1: 1 mons at {tceph1=10.166.10.95:6789/0}, election epoch 1,
quorum 0 tceph1
   osdmap e1207: 4 osds: 3 up, 3 in
    pgmap v4114539: 480 pgs: 480 active+clean; 2628 MB data, 5840 MB used,
89341 MB / 95182 MB avail
   mdsmap e1: 0/0/1 up

But if hosts moved to datacenters  like in this map:
# id weight type name up/down reweight
-1 0.12 root default
-7 0.06 datacenter dc1
-4 0.03 host tceph3
2 0.03 osd.2 up 1
-5 0.03 host tceph4
3 0.03 osd.3 up 1
-6 0.06 datacenter dc2
-2 0.03 host tceph1
0 0.03 osd.0 down 0
-3 0.03 host tceph2
1 0.03 osd.1 up 1

cluster can't reach OK state when one host is out of cluster:

  cluster 6bdb23fb-adac-4113-8c75-e6bd245fcfd6
   health HEALTH_WARN 6 pgs incomplete; 6 pgs stuck inactive; 6 pgs stuck
unclean
   monmap e1: 1 mons at {tceph1=10.166.10.95:6789/0}, election epoch 1,
quorum 0 tceph1
   osdmap e1256: 4 osds: 3 up, 3 in
    pgmap v4114707: 480 pgs: 474 active+clean, 6 incomplete; 2516 MB data,
5606 MB used, 89575 MB / 95182 MB avail
   mdsmap e1: 0/0/1 up

if downed host is up and return to the cluster then health is OK. If downed
osd manually reweighed to 0 then cluster health is OK.

Crushmap with
step chooseleaf firstn 0 type datacenter
have the same issue.

    { "rule_id": 3,
      "rule_name": "datacenter_rule",
      "ruleset": 3,
      "type": 1,
      "min_size": 1,
      "max_size": 10,
      "steps": [
            { "op": "take",
              "item": -8},
            { "op": "chooseleaf_firstn",
              "num": 0,
              "type": "datacenter"},
            { "op": "emit"}]},
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140520/efa4aede/attachment.htm>