Hi, Starting from the beginning... If a 3-replica PG gets stuck with only 2 replicas after changing tunables, it's probably a case where choose_total_tries is too low for your cluster configuration. Try increasing choose_total_tries from 50 to 75. -- Dan On Fri, Jul 22, 2016 at 4:17 PM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote: > Hello, > being in latest Hammer, I think I hit a bug with more recent than > legacy tunables. > > Being in legacy tunables for a while, I decided to experiment with > "better" tunables. So first I went from argonaut profile to bobtail > and then to firefly. However, I decided to make the changes on > chooseleaf_vary_r incrementally (because the remapping from 0 to 5 was > huge), from 5 down to the best value (1). So when I reached > chooseleaf_vary_r = 2, I decided to run a simple test before going to > chooseleaf_vary_r = 1: close an OSD (OSD.14) and let the cluster > recover. But the recovery never completes and a PG remains stuck, > reported as undersized+degraded. No OSD is near full and all pools > have min_size=1. > > ceph osd crush show-tunables -f json-pretty > > { > "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 2, > "straw_calc_version": 1, > "allowed_bucket_algs": 22, > "profile": "unknown", > "optimal_tunables": 0, > "legacy_tunables": 0, > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "require_feature_tunables3": 1, > "has_v2_rules": 0, > "has_v3_rules": 0, > "has_v4_buckets": 0 > } > > The really strange thing is that the OSDs of the stuck PG belong to > other nodes than the one I decided to stop (osd.14). > > # ceph pg dump_stuck > ok > pg_stat state up up_primary acting acting_primary > 179.38 active+undersized+degraded [2,8] 2 [2,8] 2 > > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 11.19995 root default > -3 11.19995 rack unknownrack > -2 0.39999 host staging-rd0-03 > 14 0.20000 osd.14 up 1.00000 1.00000 > 15 0.20000 osd.15 up 1.00000 1.00000 > -8 5.19998 host staging-rd0-01 > 6 0.59999 osd.6 up 1.00000 1.00000 > 7 0.59999 osd.7 up 1.00000 1.00000 > 8 1.00000 osd.8 up 1.00000 1.00000 > 9 1.00000 osd.9 up 1.00000 1.00000 > 10 1.00000 osd.10 up 1.00000 1.00000 > 11 1.00000 osd.11 up 1.00000 1.00000 > -7 5.19998 host staging-rd0-00 > 0 0.59999 osd.0 up 1.00000 1.00000 > 1 0.59999 osd.1 up 1.00000 1.00000 > 2 1.00000 osd.2 up 1.00000 1.00000 > 3 1.00000 osd.3 up 1.00000 1.00000 > 4 1.00000 osd.4 up 1.00000 1.00000 > 5 1.00000 osd.5 up 1.00000 1.00000 > -4 0.39999 host staging-rd0-02 > 12 0.20000 osd.12 up 1.00000 1.00000 > 13 0.20000 osd.13 up 1.00000 1.00000 > > > Have you experienced something similar? > > Regards, > Kostis > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com