jewel - rgw blocked on deep-scrub of bucket index pg

Sam Wouters <sam@xxxxxxxxx> · Fri, 5 May 2017 10:33:57 +0200

Hi,

we have a small cluster running on jewel 10.2.7; NL-SAS disks only, osd
data and journal co located on the disks; main purpose rgw secondary zone.

Since the upgrade to jewel, whenever a deep scrub starts on one of the
rgw index pool pg's, slow requests start piling up and rgw requests are
blocked after some hours.
The deep-scrub doesn't seem to finish (still running after +11 hours)
and only escape I found so far is a restart of the primary osd holding
the pg.

Maybe important to know, we have some large rgw buckets regarding
#objects (+ 3 million) with only index sharding of 8.

scrub related settings:
osd scrub sleep = 0.1
osd scrub during recovery = False
osd scrub priority = 1
osd deep scrub stride = 1048576
osd scrub chunk min = 1
osd scrub chunk max = 1

Any help on debugging / resolving would be very much appreciated...

regards,
Sam

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com