Any thoughts on this? We just experienced this again last night. Our 3 RGW servers had issues servicing requests for approx 7 minutes while this reshard happened. Our users received 5xx errors from haproxy which fronts the RGW instances. Haproxy is configured with a backend server timeout of 60 seconds and logged a couple thousand connections with terminating code 'sH--', indicating the RGWs did not return response headers within that time. This is especially concerning because it happens on many buckets, not just the one currently being resharded. I am testing Nautilus on our dev cluster, are there any known fixes for this issue included? Regards, Josh On Thu, Oct 31, 2019 at 2:43 PM Josh Haft <paccrap@xxxxxxxxx> wrote: > > Hi, > > Currently running Mimic 13.2.5. > > We had reports this morning of timeouts and failures with PUT and GET > requests to our Ceph RGW cluster. I found these messages in the RGW > log: > RGWReshardLock::lock failed to acquire lock on > bucket_name:bucket_instance ret=-16 > NOTICE: resharding operation on bucket index detected, blocking > block_while_resharding ERROR: bucket is still resharding, please retry > > Which were preceded by many of these, which I think are normal/expected. > check_bucket_shards: resharding needed: stats.num_objects=6415879 > shard max_objects=6400000 > > Our RGW cluster sits behind haproxy which notified me approx 90 > seconds after the first 'resharding needed' message that no backends > were available. It appears this dynamic reshard process caused the > RGWs to lock up for a period of time. Roughly 2 minutes later the > reshard error messages stop and operation returns to normal. > > Looking back through previous RGW logs, I see a similar event from > about a week ago, on the same bucket. We have several buckets with > shard counts exceeding 1k (this one only has 128), and much larger > object counts, so clearly this isn't the first time dynamic sharding > has been invoked on this cluster. > > Has anyone seen this? I expect it will come up again, and can turn up > debugging if that'll help. Thanks for any assistance! > Josh _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx