Dear all, How should ceph react in case of a host failure when from a total of 72 OSDs 12 are out? is it normal that for the remapping of the PGs it is not following the rule set for in the crush map? (according to the rule the OSDs should be selected from different chassis). in the attached file you can find the crush map, and the results of: ceph health detail ceph osd dump ceph osd tree ceph -s I can send the pg dump in a separate mail on request. Its compressed size is exceeding the size accepted by this mailing list. Thank you for any help/directions. Kind regards, Laszlo On 29.05.2017 14:58, Laszlo Budai wrote:
Hello all, We have a ceph cluster with 72 OSDs distributed on 6 hosts, in 3 chassis. In our crush map the we are distributing the PGs on chassis (complete crush map below): # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type chassis step emit } We had a host failure, and I can see that ceph is using 2 OSDs from the same chassis for a lot of the remapped PGs. Even worse, I can see that there are cases when a PG is using two OSDs from the same host like here: 3.5f6 37 0 4 37 0 149446656 3040 3040 active+remapped 2017-05-26 11:29:23.122820 61820'222074 61820:158025 [52,39] 52 [52,39,3] 52 61488'198356 2017-05-23 23:51:56.210597 61488'198356 2017-05-23 23:51:56.210597 I have tis in the log: 2017-05-26 11:26:53.244424 osd.52 10.12.193.69:6801/7044 1510 : cluster [INF] 3.5f6 restarting backfill on osd.39 from (0'0,0'0] MAX to 61488'203000 What can be wrong? Our crush map looks like this: # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable straw_calc_version 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 .... device 69 osd.69 device 70 osd.70 device 71 osd.71 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host tv-c1-al01 { id -7 # do not change unnecessarily # weight 21.840 alg straw hash 0 # rjenkins1 item osd.5 weight 1.820 item osd.11 weight 1.820 item osd.17 weight 1.820 item osd.23 weight 1.820 item osd.29 weight 1.820 item osd.35 weight 1.820 item osd.41 weight 1.820 item osd.47 weight 1.820 item osd.53 weight 1.820 item osd.59 weight 1.820 item osd.65 weight 1.820 item osd.71 weight 1.820 } host tv-c1-al02 { id -3 # do not change unnecessarily # weight 21.840 alg straw hash 0 # rjenkins1 item osd.1 weight 1.820 item osd.7 weight 1.820 item osd.13 weight 1.820 item osd.19 weight 1.820 item osd.25 weight 1.820 item osd.31 weight 1.820 item osd.37 weight 1.820 item osd.43 weight 1.820 item osd.49 weight 1.820 item osd.55 weight 1.820 item osd.61 weight 1.820 item osd.67 weight 1.820 } chassis tv-c1 { id -8 # do not change unnecessarily # weight 43.680 alg straw hash 0 # rjenkins1 item tv-c1-al01 weight 21.840 item tv-c1-al02 weight 21.840 } host tv-c2-al01 { id -5 # do not change unnecessarily # weight 21.840 alg straw hash 0 # rjenkins1 item osd.3 weight 1.820 item osd.9 weight 1.820 item osd.15 weight 1.820 item osd.21 weight 1.820 item osd.27 weight 1.820 item osd.33 weight 1.820 item osd.39 weight 1.820 item osd.45 weight 1.820 item osd.51 weight 1.820 item osd.57 weight 1.820 item osd.63 weight 1.820 item osd.70 weight 1.820 } host tv-c2-al02 { id -2 # do not change unnecessarily # weight 21.840 alg straw hash 0 # rjenkins1 item osd.0 weight 1.820 item osd.6 weight 1.820 item osd.12 weight 1.820 item osd.18 weight 1.820 item osd.24 weight 1.820 item osd.30 weight 1.820 item osd.36 weight 1.820 item osd.42 weight 1.820 item osd.48 weight 1.820 item osd.54 weight 1.820 item osd.60 weight 1.820 item osd.66 weight 1.820 } chassis tv-c2 { id -9 # do not change unnecessarily # weight 43.680 alg straw hash 0 # rjenkins1 item tv-c2-al01 weight 21.840 item tv-c2-al02 weight 21.840 } host tv-c1-al03 { id -6 # do not change unnecessarily # weight 21.840 alg straw hash 0 # rjenkins1 item osd.4 weight 1.820 item osd.10 weight 1.820 item osd.16 weight 1.820 item osd.22 weight 1.820 item osd.28 weight 1.820 item osd.34 weight 1.820 item osd.40 weight 1.820 item osd.46 weight 1.820 item osd.52 weight 1.820 item osd.58 weight 1.820 item osd.64 weight 1.820 item osd.69 weight 1.820 } host tv-c2-al03 { id -4 # do not change unnecessarily # weight 21.840 alg straw hash 0 # rjenkins1 item osd.2 weight 1.820 item osd.8 weight 1.820 item osd.14 weight 1.820 item osd.20 weight 1.820 item osd.26 weight 1.820 item osd.32 weight 1.820 item osd.38 weight 1.820 item osd.44 weight 1.820 item osd.50 weight 1.820 item osd.56 weight 1.820 item osd.62 weight 1.820 item osd.68 weight 1.820 } chassis tv-c3 { id -10 # do not change unnecessarily # weight 43.680 alg straw hash 0 # rjenkins1 item tv-c1-al03 weight 21.840 item tv-c2-al03 weight 21.840 } root default { id -1 # do not change unnecessarily # weight 131.040 alg straw hash 0 # rjenkins1 item tv-c1 weight 43.680 item tv-c2 weight 43.680 item tv-c3 weight 43.680 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type chassis step emit } # end crush map Thank you, Laszlo _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Attachment:
send.tar.bz2
Description: application/bzip
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com