Hello! I've run into a bit of an issue with one of our radosgw production clusters.. Setup is two radosgw nodes behind haproxy loadbalancing, which in turn are connected to the ceph cluster. Everything running 14.2.2 so Nautilus. It's tied to a openstack cluster, so keystone as authentication backend (should really matter though). Today both rgw backends crashed. Checking logs it seems to be related to dynamic resharding of a bucket, causing Lock errors: Logs snippet: https://pastebin.com/uBCnhinF Checking http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021368.html (old), I performed a manual reshard of affected bucket with success (radosgw-admin bucket reshard --bucket="XXX/YYY" --num-shards=256) Checking the metadata for bucket, it now correctly shows 256, up from 128. HOWEVER, the dynamic resharding still kept happening and bringing down the backeds. I suspect it is because of the old reshard op hanging around when checking a `reshard list`: https://pastebin.com/dPChwBCT As the resharding seems to have been successful when running manually, I now want to remove that reshard op, but can't, getting this https://pastebin.com/071kfAsa error when trying.. Right now I had to resort to setting rgw_dynamic_resharding = false in ceph.conf to stop the problem from occuring. Ideas? Cheers Erik _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx