Multisite s3 website slow period update

I’ve had a quite unpleasant experience today that I would like to share.

In our setup we use two set’s of RGW one that has only s3 and admin API and a second set with s3website and admin API. I was changing the global quota setting which means that I’ve then needed to commit the updated period.

The first set of s3 RGWs updated without issue, but the s3website RGWs not. They were somehow stuck at the period update took minutes in some cases meaning service disruption for MINUTES!

When I’ve looked at the RGW logs I’ve found this line that seems to show the issue.

rgw realm reloader: Pausing frontends for realm update...
req 3935991300378050219 61.212696075s s3:get_obj iterate_obj() failed with -104

We are running on ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)

Did someone had a similar issue? Is there a something we can do about it? Is there an option to update the period machine by machine or some other way that would let us to update the period and not disrupt the service?

Your help is much appreciated.

Kind regards,

