Re: [rgw multisite] Perpetual behind

"Alexander E. Patrakov" <patrakov@xxxxxxxxx> · Sun, 18 Jun 2023 00:18:20 +0800

On Sat, Jun 17, 2023 at 4:41 AM Yixin Jin <yjin77@xxxxxxxx> wrote:
>
> Hi ceph gurus,
>
> I am experimenting with rgw multisite sync feature using Quincy release (17.2.5). I am using the zone-level sync, not bucket-level sync policy. During my experiment, somehow my setup got into a situation that it doesn't seem to get out of. One zone is perpetually behind the other, although there is no ongoing client request.
>
> Here is the output of my "sync status":
>
> root@mon1-z1:~# radosgw-admin sync status
>           realm f90e4356-3aa7-46eb-a6b7-117dfa4607c4 (test-realm)
>       zonegroup a5f23c9c-0640-41f2-956f-a8523eccecb3 (zg)
>            zone bbe3e2a1-bdba-4977-affb-80596a6fe2b9 (z1)
>   metadata sync no sync (zone is master)
>       data sync source: 9645a68b-012e-4889-bf24-096e7478f786 (z2)
>                         syncing
>                         full sync: 0/128 shards
>                         incremental sync: 128/128 shards
>                         data is behind on 14 shards
>                         behind shards: [56,61,63,107,108,109,110,111,112,113,114,115,116,117]
>
>
> It stays behind forever while rgw is almost completely idle (1% of CPU).
>
> Any suggestion on how to drill deeper to see what happened?

Hello!

I have no idea what has happened, but it would be helpful if you
confirm the latency between the two clusters. In other words, please
don't expect the sync between e.g. Germany and Singapore to catch up
fast. It will be limited by the amount of data that can be synced in
one request and the hard-coded maximum number of requests in flight.

In Reef, there are new tunables that help on high-latency links:
rgw_data_sync_spawn_window, rgw_bucket_sync_spawn_window.

-- 
Alexander E. Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx