Hi Brad, pool 0 'data' replicated size 2 min_size 1 crush_ruleset 3 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 119047 crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 3 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 119048 stripe_width 0 pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 3 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 119049 stripe_width 0 pool 3 'blocks' replicated size 2 min_size 1 crush_ruleset 4 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 119050 stripe_width 0 pool 4 'maps' replicated size 2 min_size 1 crush_ruleset 3 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 119051 stripe_width 0 pool 179 'scbench' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 154034 flags hashpspool stripe_width 0 This is the status of 179.38 when the cluster is healthy: http://pastebin.ca/3663600 and this is when recovery is stuck: http://pastebin.ca/3663601 It seems that the PG is replicated with size 3 but the cluster cannot create the third replica for some objects whose third OSD (OSD.14) is down. That was not the case with argonaut tunables as I remember. Regards On 23 July 2016 at 06:16, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: > On Sat, Jul 23, 2016 at 12:17 AM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote: >> Hello, >> being in latest Hammer, I think I hit a bug with more recent than >> legacy tunables. >> >> Being in legacy tunables for a while, I decided to experiment with >> "better" tunables. So first I went from argonaut profile to bobtail >> and then to firefly. However, I decided to make the changes on >> chooseleaf_vary_r incrementally (because the remapping from 0 to 5 was >> huge), from 5 down to the best value (1). So when I reached >> chooseleaf_vary_r = 2, I decided to run a simple test before going to >> chooseleaf_vary_r = 1: close an OSD (OSD.14) and let the cluster >> recover. But the recovery never completes and a PG remains stuck, >> reported as undersized+degraded. No OSD is near full and all pools >> have min_size=1. >> >> ceph osd crush show-tunables -f json-pretty >> >> { >> "choose_local_tries": 0, >> "choose_local_fallback_tries": 0, >> "choose_total_tries": 50, >> "chooseleaf_descend_once": 1, >> "chooseleaf_vary_r": 2, >> "straw_calc_version": 1, >> "allowed_bucket_algs": 22, >> "profile": "unknown", >> "optimal_tunables": 0, >> "legacy_tunables": 0, >> "require_feature_tunables": 1, >> "require_feature_tunables2": 1, >> "require_feature_tunables3": 1, >> "has_v2_rules": 0, >> "has_v3_rules": 0, >> "has_v4_buckets": 0 >> } >> >> The really strange thing is that the OSDs of the stuck PG belong to >> other nodes than the one I decided to stop (osd.14). >> >> # ceph pg dump_stuck >> ok >> pg_stat state up up_primary acting acting_primary >> 179.38 active+undersized+degraded [2,8] 2 [2,8] 2 > > Can you share a query of this pg? > > What size (not min size) is this pool (assuming it's 2)? > >> >> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 11.19995 root default >> -3 11.19995 rack unknownrack >> -2 0.39999 host staging-rd0-03 >> 14 0.20000 osd.14 up 1.00000 1.00000 >> 15 0.20000 osd.15 up 1.00000 1.00000 >> -8 5.19998 host staging-rd0-01 >> 6 0.59999 osd.6 up 1.00000 1.00000 >> 7 0.59999 osd.7 up 1.00000 1.00000 >> 8 1.00000 osd.8 up 1.00000 1.00000 >> 9 1.00000 osd.9 up 1.00000 1.00000 >> 10 1.00000 osd.10 up 1.00000 1.00000 >> 11 1.00000 osd.11 up 1.00000 1.00000 >> -7 5.19998 host staging-rd0-00 >> 0 0.59999 osd.0 up 1.00000 1.00000 >> 1 0.59999 osd.1 up 1.00000 1.00000 >> 2 1.00000 osd.2 up 1.00000 1.00000 >> 3 1.00000 osd.3 up 1.00000 1.00000 >> 4 1.00000 osd.4 up 1.00000 1.00000 >> 5 1.00000 osd.5 up 1.00000 1.00000 >> -4 0.39999 host staging-rd0-02 >> 12 0.20000 osd.12 up 1.00000 1.00000 >> 13 0.20000 osd.13 up 1.00000 1.00000 >> >> >> Have you experienced something similar? >> >> Regards, >> Kostis >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Cheers, > Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com