Re: [rgw multisite] Perpetual behind

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Yixin,
One place that I start with trying to figure this out is the sync
error logs. You may have already looked here:
sudo radosgw-admin sync error list --rgw-zone={zone_name}
If there's a lot in there you can trim it to a specific date so you
can see if they're still occurring
sudo radosgw-admin sync error trim --end-date="2023-06-16 03:00:00"
--rgw-zone={zone_name}
There's a log for both sides of the sync, so make sure you check both
your zones.

The next thing I try is re-running a full sync, metadata and then data:
sudo radosgw-admin metadata sync init --rgw-zone=zone1 --source_zone=zone2
sudo radosgw-admin metadata sync init --rgw-zone=zone2 --source_zone=zone1

sudo radosgw-admin data sync init --rgw-zone=zone1 --source_zone=zone2
sudo radosgw-admin data sync init --rgw-zone=zone2 --source_zone=zone1

you need to restart all the rgw processes to get this to start.
Obviously if you have a massive amount of data you don't want to
re-run a full data sync.

Lastly, I had this stuck sync happen for me with an old cluster that
had explicit placement in the buckets. I think this is because the
pool name was different in each of my zones so the explicit placement
couldn't find anywhere to put the data and the sync never finished.
Might be worth checking for this situation as there is also another
thread on the mailing list recently where someone had explicit
placement causing issues with regards to sync.

I hope that helps you track down the issue.
Rich

On Sat, 17 Jun 2023 at 08:41, Yixin Jin <yjin77@xxxxxxxx> wrote:
>
> Hi ceph gurus,
>
> I am experimenting with rgw multisite sync feature using Quincy release (17.2.5). I am using the zone-level sync, not bucket-level sync policy. During my experiment, somehow my setup got into a situation that it doesn't seem to get out of. One zone is perpetually behind the other, although there is no ongoing client request.
>
> Here is the output of my "sync status":
>
> root@mon1-z1:~# radosgw-admin sync status
>           realm f90e4356-3aa7-46eb-a6b7-117dfa4607c4 (test-realm)
>       zonegroup a5f23c9c-0640-41f2-956f-a8523eccecb3 (zg)
>            zone bbe3e2a1-bdba-4977-affb-80596a6fe2b9 (z1)
>   metadata sync no sync (zone is master)
>       data sync source: 9645a68b-012e-4889-bf24-096e7478f786 (z2)
>                         syncing
>                         full sync: 0/128 shards
>                         incremental sync: 128/128 shards
>                         data is behind on 14 shards
>                         behind shards: [56,61,63,107,108,109,110,111,112,113,114,115,116,117]
>
>
> It stays behind forever while rgw is almost completely idle (1% of CPU).
>
> Any suggestion on how to drill deeper to see what happened?
>
> Thanks,
> Yixin
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux