Hello Dan, I increased choose_local_tries to 75 and the misplaced objects reduced to 286. One more increase to 100 to get 141 misplaced objects and one more to 125 for the cluster to fully recover! I also verified that I can now down + out an OSD and the cluster will also fully recover. My problem is that this setting would not cross my mind ever. Even in the docs, it is written for total_tries that "For extremely large clusters, a larger value might be necessary.", but my cluster with 16 OSDs and 40T of 13% utilization could not be considered such a cluster (an extremely larger one). I also wonder what should be the value when I will apply the tunables to my largest clusters with over 150 OSDs and hundreds of TB... I would be grateful if you could point me to some code or documentation (for this tunable and the others too also) that would have make me "see" the problem earlier and make a plan for the future. Kostis On 26 July 2016 at 12:42, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > Hi, > > Starting from the beginning... > > If a 3-replica PG gets stuck with only 2 replicas after changing > tunables, it's probably a case where choose_total_tries is too low for > your cluster configuration. > Try increasing choose_total_tries from 50 to 75. > > -- Dan > > > > On Fri, Jul 22, 2016 at 4:17 PM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote: >> Hello, >> being in latest Hammer, I think I hit a bug with more recent than >> legacy tunables. >> >> Being in legacy tunables for a while, I decided to experiment with >> "better" tunables. So first I went from argonaut profile to bobtail >> and then to firefly. However, I decided to make the changes on >> chooseleaf_vary_r incrementally (because the remapping from 0 to 5 was >> huge), from 5 down to the best value (1). So when I reached >> chooseleaf_vary_r = 2, I decided to run a simple test before going to >> chooseleaf_vary_r = 1: close an OSD (OSD.14) and let the cluster >> recover. But the recovery never completes and a PG remains stuck, >> reported as undersized+degraded. No OSD is near full and all pools >> have min_size=1. >> >> ceph osd crush show-tunables -f json-pretty >> >> { >> "choose_local_tries": 0, >> "choose_local_fallback_tries": 0, >> "choose_total_tries": 50, >> "chooseleaf_descend_once": 1, >> "chooseleaf_vary_r": 2, >> "straw_calc_version": 1, >> "allowed_bucket_algs": 22, >> "profile": "unknown", >> "optimal_tunables": 0, >> "legacy_tunables": 0, >> "require_feature_tunables": 1, >> "require_feature_tunables2": 1, >> "require_feature_tunables3": 1, >> "has_v2_rules": 0, >> "has_v3_rules": 0, >> "has_v4_buckets": 0 >> } >> >> The really strange thing is that the OSDs of the stuck PG belong to >> other nodes than the one I decided to stop (osd.14). >> >> # ceph pg dump_stuck >> ok >> pg_stat state up up_primary acting acting_primary >> 179.38 active+undersized+degraded [2,8] 2 [2,8] 2 >> >> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 11.19995 root default >> -3 11.19995 rack unknownrack >> -2 0.39999 host staging-rd0-03 >> 14 0.20000 osd.14 up 1.00000 1.00000 >> 15 0.20000 osd.15 up 1.00000 1.00000 >> -8 5.19998 host staging-rd0-01 >> 6 0.59999 osd.6 up 1.00000 1.00000 >> 7 0.59999 osd.7 up 1.00000 1.00000 >> 8 1.00000 osd.8 up 1.00000 1.00000 >> 9 1.00000 osd.9 up 1.00000 1.00000 >> 10 1.00000 osd.10 up 1.00000 1.00000 >> 11 1.00000 osd.11 up 1.00000 1.00000 >> -7 5.19998 host staging-rd0-00 >> 0 0.59999 osd.0 up 1.00000 1.00000 >> 1 0.59999 osd.1 up 1.00000 1.00000 >> 2 1.00000 osd.2 up 1.00000 1.00000 >> 3 1.00000 osd.3 up 1.00000 1.00000 >> 4 1.00000 osd.4 up 1.00000 1.00000 >> 5 1.00000 osd.5 up 1.00000 1.00000 >> -4 0.39999 host staging-rd0-02 >> 12 0.20000 osd.12 up 1.00000 1.00000 >> 13 0.20000 osd.13 up 1.00000 1.00000 >> >> >> Have you experienced something similar? >> >> Regards, >> Kostis >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com