Re: Recovery stuck after adjusting to recent tunables

Kostis Fardelas <dante1234@xxxxxxxxx> · Sat, 23 Jul 2016 09:32:09 +0300

Hi Brad,

pool 0 'data' replicated size 2 min_size 1 crush_ruleset 3 object_hash
rjenkins pg_num 2048 pgp_num 2048 last_change 119047
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 3
object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 119048
stripe_width 0
pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 3 object_hash
rjenkins pg_num 2048 pgp_num 2048 last_change 119049 stripe_width 0
pool 3 'blocks' replicated size 2 min_size 1 crush_ruleset 4
object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 119050
stripe_width 0
pool 4 'maps' replicated size 2 min_size 1 crush_ruleset 3 object_hash
rjenkins pg_num 2048 pgp_num 2048 last_change 119051 stripe_width 0
pool 179 'scbench' replicated size 3 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 100 pgp_num 100 last_change 154034 flags
hashpspool stripe_width 0

This is the status of 179.38 when the cluster is healthy:
http://pastebin.ca/3663600

and this is when recovery is stuck:
http://pastebin.ca/3663601

It seems that the PG is replicated with size 3 but the cluster cannot
create the third replica for some objects whose third OSD (OSD.14) is
down. That was not the case with argonaut tunables as I remember.

Regards

On 23 July 2016 at 06:16, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
> On Sat, Jul 23, 2016 at 12:17 AM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote:
>> Hello,
>> being in latest Hammer, I think I hit a bug with more recent than
>> legacy tunables.
>>
>> Being in legacy tunables for a while, I decided to experiment with
>> "better" tunables. So first I went from argonaut profile to bobtail
>> and then to firefly. However, I decided to make the changes on
>> chooseleaf_vary_r incrementally (because the remapping from 0 to 5 was
>> huge), from 5 down to the best value (1). So when I reached
>> chooseleaf_vary_r = 2, I decided to run a simple test before going to
>> chooseleaf_vary_r = 1: close an OSD (OSD.14) and let the cluster
>> recover. But the recovery never completes and a PG remains stuck,
>> reported as undersized+degraded. No OSD is near full and all pools
>> have min_size=1.
>>
>> ceph osd crush show-tunables -f json-pretty
>>
>> {
>>     "choose_local_tries": 0,
>>     "choose_local_fallback_tries": 0,
>>     "choose_total_tries": 50,
>>     "chooseleaf_descend_once": 1,
>>     "chooseleaf_vary_r": 2,
>>     "straw_calc_version": 1,
>>     "allowed_bucket_algs": 22,
>>     "profile": "unknown",
>>     "optimal_tunables": 0,
>>     "legacy_tunables": 0,
>>     "require_feature_tunables": 1,
>>     "require_feature_tunables2": 1,
>>     "require_feature_tunables3": 1,
>>     "has_v2_rules": 0,
>>     "has_v3_rules": 0,
>>     "has_v4_buckets": 0
>> }
>>
>> The really strange thing is that the OSDs of the stuck PG belong to
>> other nodes than the one I decided to stop (osd.14).
>>
>> # ceph pg dump_stuck
>> ok
>> pg_stat state up up_primary acting acting_primary
>> 179.38 active+undersized+degraded [2,8] 2 [2,8] 2
>
> Can you share a query of this pg?
>
> What size (not min size) is this pool (assuming it's 2)?
>
>>
>>
>> ID WEIGHT   TYPE NAME                   UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 11.19995 root default
>> -3 11.19995     rack unknownrack
>> -2  0.39999         host staging-rd0-03
>> 14  0.20000             osd.14               up  1.00000          1.00000
>> 15  0.20000             osd.15               up  1.00000          1.00000
>> -8  5.19998         host staging-rd0-01
>>  6  0.59999             osd.6                up  1.00000          1.00000
>>  7  0.59999             osd.7                up  1.00000          1.00000
>>  8  1.00000             osd.8                up  1.00000          1.00000
>>  9  1.00000             osd.9                up  1.00000          1.00000
>> 10  1.00000             osd.10               up  1.00000          1.00000
>> 11  1.00000             osd.11               up  1.00000          1.00000
>> -7  5.19998         host staging-rd0-00
>>  0  0.59999             osd.0                up  1.00000          1.00000
>>  1  0.59999             osd.1                up  1.00000          1.00000
>>  2  1.00000             osd.2                up  1.00000          1.00000
>>  3  1.00000             osd.3                up  1.00000          1.00000
>>  4  1.00000             osd.4                up  1.00000          1.00000
>>  5  1.00000             osd.5                up  1.00000          1.00000
>> -4  0.39999         host staging-rd0-02
>> 12  0.20000             osd.12               up  1.00000          1.00000
>> 13  0.20000             osd.13               up  1.00000          1.00000
>>
>>
>> Have you experienced something similar?
>>
>> Regards,
>> Kostis
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Cheers,
> Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com