Reef RGWs stop processing requests

Iain Stott <Iain.Stott@xxxxxxx> · Fri, 17 May 2024 09:23:55 +0000

Hi,

We are running 3 clusters in multisite. All 3 were running Quincy 17.2.6 and using cephadm. We upgraded one of the secondary sites to Reef 18.2.1 a couple of weeks ago and were planning on doing the rest shortly afterwards.

We run 3 RGW daemons on separate physical hosts behind an external HAProxy HA pair for each cluster.

Since we upgrade to Reef we have had issues with the RGWs stopping processing requests. We can see that they don't crash as they still have entries in the logs about syncing, but as far as request processing goes, they just stop. While debugging this we have 1 of the 3 RGWs running a Quincy image, and this has never had an issue where it stops processing requests. Any Reef containers we deploy have always stopped within 48Hrs of being deployed. We have tried Reef versions 18.2.1, 18.2.2 and 18.1.3 and all exhibit the same issue. We are running podman 4.6.1 on Centos 8 with kernel 4.18.0-513.24.1.el8_9.x86_64.

We have enabled debug logs for the RGWs but we have been unable to find anything in them that would shed light on the cause.

We are just wondering if anyone had any ideas on what could be causing this or how to debug it further?

Thanks
Iain

Iain Stott
OpenStack Engineer
Iain.Stott@xxxxxxx
[THG Ingenuity Logo]<https://www.thg.com>
www.thg.com<https://www.thg.com/>
[LinkedIn]<https://www.linkedin.com/company/thgplc/?originalSubdomain=uk> [Instagram] <https://www.instagram.com/thg>  [X] <https://twitter.com/thgplc?lang=en>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx