Hey Kamil On 14/11/2022 13:54, Kamil Madac wrote:
Hello, I'm trying to create a RGW Zonegroup with two zones, and to have data replicated between the zones. Each zone is separate Ceph cluster. There is a possibility to use list of endpoints in zone definitions (not just single endpoint) which will be then used for the replication between zones. so I tried to use it instead of using LB in front of clusters for the replication . [...] When node is back again, replication continue to work. What is the reason to have possibility to have multiple endpoints in the zone configuration when outage of one of them makes replication not working?
We are running a similar setup and ran into similar issues before when doing rolling restarts of the RGWs.
1) Mostly it's a single metadata shard never syncing up and requireing a complete "metadata init". But this issue will likely be address via https://tracker.ceph.com/issues/39657
2) But we also observed issues with one RGW being unavailable or just slow and as a result influencing the whole sync process. I suppose the HTTP client used within rgw syncer does not do a good job of tracking which remote RGW is healthy or a slow reading RGW could just be locking all the shards ...
3) But as far as "cooperating" goes there are improvements being worked on, see https://tracker.ceph.com/issues/41230 or https://github.com/ceph/ceph/pull/45958 which then makes better use of having multiple distinct RGW in both zones.
Regards Christian _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx