Hi, Currently running Mimic 13.2.5. We had reports this morning of timeouts and failures with PUT and GET requests to our Ceph RGW cluster. I found these messages in the RGW log: RGWReshardLock::lock failed to acquire lock on bucket_name:bucket_instance ret=-16 NOTICE: resharding operation on bucket index detected, blocking block_while_resharding ERROR: bucket is still resharding, please retry Which were preceded by many of these, which I think are normal/expected. check_bucket_shards: resharding needed: stats.num_objects=6415879 shard max_objects=6400000 Our RGW cluster sits behind haproxy which notified me approx 90 seconds after the first 'resharding needed' message that no backends were available. It appears this dynamic reshard process caused the RGWs to lock up for a period of time. Roughly 2 minutes later the reshard error messages stop and operation returns to normal. Looking back through previous RGW logs, I see a similar event from about a week ago, on the same bucket. We have several buckets with shard counts exceeding 1k (this one only has 128), and much larger object counts, so clearly this isn't the first time dynamic sharding has been invoked on this cluster. Has anyone seen this? I expect it will come up again, and can turn up debugging if that'll help. Thanks for any assistance! Josh _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx