It should also be noted that hammer is pretty close to retirement and is a poor choice for new clusters. On Wed, May 31, 2017 at 6:17 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Mon, May 29, 2017 at 4:58 AM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote: >> >> Hello all, >> >> We have a ceph cluster with 72 OSDs distributed on 6 hosts, in 3 chassis. In >> our crush map the we are distributing the PGs on chassis (complete crush map >> below): >> >> # rules >> rule replicated_ruleset { >> ruleset 0 >> type replicated >> min_size 1 >> max_size 10 >> step take default >> step chooseleaf firstn 0 type chassis >> step emit >> } >> >> We had a host failure, and I can see that ceph is using 2 OSDs from the same >> chassis for a lot of the remapped PGs. Even worse, I can see that there are >> cases when a PG is using two OSDs from the same host like here: >> >> 3.5f6 37 0 4 37 0 149446656 3040 3040 >> active+remapped 2017-05-26 11:29:23.122820 61820'222074 61820:158025 >> [52,39] 52 [52,39,3] 52 61488'198356 2017-05-23 >> 23:51:56.210597 61488'198356 2017-05-23 23:51:56.210597 >> >> I have tis in the log: >> 2017-05-26 11:26:53.244424 osd.52 10.12.193.69:6801/7044 1510 : cluster >> [INF] 3.5f6 restarting backfill on osd.39 from (0'0,0'0] MAX to 61488'203000 >> >> What can be wrong? > > It's not clear from the output you've provided whether your pools have > size 2 or 3. From what you've shown, I'm guessing you have size 2, and > the OSD failure prompted a move of the PG in question away from OSD 3 > to OSD 39. Since 39 doesn't have any of the data yet, OSD 3 is being > maintained in the acting set to maintain redundancy, but it will go > away one the backfill is done. > > In general, it's a failure of CRUSH's design goals if you see moves of > the replica within buckets which didn't experience failure, but they > do sometimes happen. There have been a lot of improvements over the > years to reduce how often that happens, some of which are supported by > Hammer but not on by default (because it prevents use of older > clients), some of which are only in very new code like the Luminous > dev releases. I suspect you'd find things behave better under your > cluster if you upgrade to Jewel and set the CRUSH flags it recommends > to you. > -Greg > >> >> >> Our crush map looks like this: >> >> # begin crush map >> tunable choose_local_tries 0 >> tunable choose_local_fallback_tries 0 >> tunable choose_total_tries 50 >> tunable chooseleaf_descend_once 1 >> tunable straw_calc_version 1 >> >> # devices >> device 0 osd.0 >> device 1 osd.1 >> device 2 osd.2 >> device 3 osd.3 >> .... >> device 69 osd.69 >> device 70 osd.70 >> device 71 osd.71 >> >> # types >> type 0 osd >> type 1 host >> type 2 chassis >> type 3 rack >> type 4 row >> type 5 pdu >> type 6 pod >> type 7 room >> type 8 datacenter >> type 9 region >> type 10 root >> >> # buckets >> host tv-c1-al01 { >> id -7 # do not change unnecessarily >> # weight 21.840 >> alg straw >> hash 0 # rjenkins1 >> item osd.5 weight 1.820 >> item osd.11 weight 1.820 >> item osd.17 weight 1.820 >> item osd.23 weight 1.820 >> item osd.29 weight 1.820 >> item osd.35 weight 1.820 >> item osd.41 weight 1.820 >> item osd.47 weight 1.820 >> item osd.53 weight 1.820 >> item osd.59 weight 1.820 >> item osd.65 weight 1.820 >> item osd.71 weight 1.820 >> } >> host tv-c1-al02 { >> id -3 # do not change unnecessarily >> # weight 21.840 >> alg straw >> hash 0 # rjenkins1 >> item osd.1 weight 1.820 >> item osd.7 weight 1.820 >> item osd.13 weight 1.820 >> item osd.19 weight 1.820 >> item osd.25 weight 1.820 >> item osd.31 weight 1.820 >> item osd.37 weight 1.820 >> item osd.43 weight 1.820 >> item osd.49 weight 1.820 >> item osd.55 weight 1.820 >> item osd.61 weight 1.820 >> item osd.67 weight 1.820 >> } >> chassis tv-c1 { >> id -8 # do not change unnecessarily >> # weight 43.680 >> alg straw >> hash 0 # rjenkins1 >> item tv-c1-al01 weight 21.840 >> item tv-c1-al02 weight 21.840 >> } >> host tv-c2-al01 { >> id -5 # do not change unnecessarily >> # weight 21.840 >> alg straw >> hash 0 # rjenkins1 >> item osd.3 weight 1.820 >> item osd.9 weight 1.820 >> item osd.15 weight 1.820 >> item osd.21 weight 1.820 >> item osd.27 weight 1.820 >> item osd.33 weight 1.820 >> item osd.39 weight 1.820 >> item osd.45 weight 1.820 >> item osd.51 weight 1.820 >> item osd.57 weight 1.820 >> item osd.63 weight 1.820 >> item osd.70 weight 1.820 >> } >> host tv-c2-al02 { >> id -2 # do not change unnecessarily >> # weight 21.840 >> alg straw >> hash 0 # rjenkins1 >> item osd.0 weight 1.820 >> item osd.6 weight 1.820 >> item osd.12 weight 1.820 >> item osd.18 weight 1.820 >> item osd.24 weight 1.820 >> item osd.30 weight 1.820 >> item osd.36 weight 1.820 >> item osd.42 weight 1.820 >> item osd.48 weight 1.820 >> item osd.54 weight 1.820 >> item osd.60 weight 1.820 >> item osd.66 weight 1.820 >> } >> chassis tv-c2 { >> id -9 # do not change unnecessarily >> # weight 43.680 >> alg straw >> hash 0 # rjenkins1 >> item tv-c2-al01 weight 21.840 >> item tv-c2-al02 weight 21.840 >> } >> host tv-c1-al03 { >> id -6 # do not change unnecessarily >> # weight 21.840 >> alg straw >> hash 0 # rjenkins1 >> item osd.4 weight 1.820 >> item osd.10 weight 1.820 >> item osd.16 weight 1.820 >> item osd.22 weight 1.820 >> item osd.28 weight 1.820 >> item osd.34 weight 1.820 >> item osd.40 weight 1.820 >> item osd.46 weight 1.820 >> item osd.52 weight 1.820 >> item osd.58 weight 1.820 >> item osd.64 weight 1.820 >> item osd.69 weight 1.820 >> } >> host tv-c2-al03 { >> id -4 # do not change unnecessarily >> # weight 21.840 >> alg straw >> hash 0 # rjenkins1 >> item osd.2 weight 1.820 >> item osd.8 weight 1.820 >> item osd.14 weight 1.820 >> item osd.20 weight 1.820 >> item osd.26 weight 1.820 >> item osd.32 weight 1.820 >> item osd.38 weight 1.820 >> item osd.44 weight 1.820 >> item osd.50 weight 1.820 >> item osd.56 weight 1.820 >> item osd.62 weight 1.820 >> item osd.68 weight 1.820 >> } >> chassis tv-c3 { >> id -10 # do not change unnecessarily >> # weight 43.680 >> alg straw >> hash 0 # rjenkins1 >> item tv-c1-al03 weight 21.840 >> item tv-c2-al03 weight 21.840 >> } >> root default { >> id -1 # do not change unnecessarily >> # weight 131.040 >> alg straw >> hash 0 # rjenkins1 >> item tv-c1 weight 43.680 >> item tv-c2 weight 43.680 >> item tv-c3 weight 43.680 >> } >> >> # rules >> rule replicated_ruleset { >> ruleset 0 >> type replicated >> min_size 1 >> max_size 10 >> step take default >> step chooseleaf firstn 0 type chassis >> step emit >> } >> >> # end crush map >> >> >> Thank you, >> Laszlo >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com