Re: jewel - rgw blocked on deep-scrub of bucket index pg

Sam Wouters <sam@xxxxxxxxx> · Sat, 6 May 2017 20:33:36 +0200

Hi,

On 06-05-17 20:08, Wido den Hollander wrote:
>> Op 6 mei 2017 om 9:55 schreef Christian Balzer <chibi@xxxxxxx>:
>>
>>
>>
>> Hello,
>>
>> On Sat, 6 May 2017 09:25:15 +0200 (CEST) Wido den Hollander wrote:
>>
>>>> Op 5 mei 2017 om 10:33 schreef Sam Wouters <sam@xxxxxxxxx>:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> we have a small cluster running on jewel 10.2.7; NL-SAS disks only, osd
>>>> data and journal co located on the disks; main purpose rgw secondary zone.
>>>>
>>>> Since the upgrade to jewel, whenever a deep scrub starts on one of the
>>>> rgw index pool pg's, slow requests start piling up and rgw requests are
>>>> blocked after some hours.
>>>> The deep-scrub doesn't seem to finish (still running after +11 hours)
>>>> and only escape I found so far is a restart of the primary osd holding
>>>> the pg.
>>>>
>>>> Maybe important to know, we have some large rgw buckets regarding
>>>> #objects (+ 3 million) with only index sharding of 8.
>>>>
>>>> scrub related settings:
>>>> osd scrub sleep = 0.1  
>>> Try removing this line, it can block threads under Jewel.
I also found the bug report (#19497) yesterday, so indeed removed the
sleep and manually started the deep-scrub. I didn't had time to check
the result until now.
After almost 26 hours the deep-scrub operation finished (2017-05-05
10:57:08 -> 2017-05-06 12:29:05), however during the scrubbing frequent
timeouts and complete rgw downtime for various periods of time occurred.

Our primary cluster is still running hammer, and on there the index
pools are on ssd's, but this still raises concerns for after the planned
upgrade of that one...

Thanks a lot for the help!

r,
Sam
>>>
>> I'd really REALLY wish that would get fixed properly, as in the original
>> functionality restored. 
> Afaik new work is being done on this. There was a recent thread on the ceph-users or devel (can't find it) that new code is out there to fix this.
>
> Wido
>
>> Because as we've learned entrusting everything into internal Ceph queues
>> with priorities isn't working as expected in all cases.
>>
>> For a second, very distant option, turn it into a NOP for the time being.
>> As it stands now, it's another self-made, Jewel introduced bug...
>>
>> Christian
>>
>>> See how that works out.
>>>
>>> Wido
>>>
>>>> osd scrub during recovery = False
>>>> osd scrub priority = 1
>>>> osd deep scrub stride = 1048576
>>>> osd scrub chunk min = 1
>>>> osd scrub chunk max = 1
>>>>
>>>> Any help on debugging / resolving would be very much appreciated...
>>>>
>>>> regards,
>>>> Sam
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> -- 
>> Christian Balzer        Network/Systems Engineer                
>> chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
>> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com