I made some progress understanding this. It seems the RGW is aware that the sync is behind, despite not reporting it on "sync status". $ radosgw-admin sync status realm d2fa006d-7ced-423f-8510-9ac494c4f4ec (geored_realm) zonegroup 583c773c-b7e5-4e7f-a51e-c602237ec9c6 (geored_zg) zone 4bd83282-c7da-4dd9-9f18-d8d8d63b88c9 (siteA) current time 2024-11-06T18:03:03Z zonegroup features enabled: disabled: compress-encrypted,notification_v2,resharding metadata sync no sync (zone is master) data sync source: c2800277-80a5-4646-adff-99eae966c6fb (siteB) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source $ radosgw-admin bucket sync --bucket ahkbucket --source-zone siteB status realm d2fa006d-7ced-423f-8510-9ac494c4f4ec (geored_realm) zonegroup 583c773c-b7e5-4e7f-a51e-c602237ec9c6 (geored_zg) zone 4bd83282-c7da-4dd9-9f18-d8d8d63b88c9 (siteA) bucket :ahkbucket[4bd83282-c7da-4dd9-9f18-d8d8d63b88c9.184347.1]) current time 2024-11-06T18:03:06Z source zone c2800277-80a5-4646-adff-99eae966c6fb (siteB) source bucket :ahkbucket[4bd83282-c7da-4dd9-9f18-d8d8d63b88c9.184347.1]) incremental sync on 16 shards bucket is behind on 2 shards behind shards: [5,12] Re-triggering a full sync with: * "radosgw-admin bucket sync --bucket ahkbucket init --source-zone siteB" * "radosgw-admin bucket sync --bucket ahkbucket run --source-zone siteB" ...restores all my un-synced objects. However the issue still persists after writing more objects. I think that implies that: * full sync is working * incremental sync is not ________________________________ From: Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx> Sent: Wednesday, November 6, 2024 3:27 PM To: Eugen Block <eblock@xxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: Re: [EXTERNAL] Re: Ceph Multisite Version Compatibility Hi Eugen, Thanks for the suggestions. It has worked for me before. It's certainly possible it's a misconfiguration, however I've reproduced this on upgrade of some long lived systems that have been happily syncing away on Octopus for several years. Definitely keen to understand if I'm missing something in the configuration. I am currently combing through network captures of the sync traffic trying to figure out the difference. It's also odd that my "sync status" command is not reporting an error but is claiming we are in sync, despite that not being the case; my best guess at the moment is that the uplevel version is not correctly checking the downlevels logs. I know a lot of refactoring of the multisite sync process has been done recently so I wonder if that may be related. For reference, I raised this tracker: Bug #68819: rgw: multisite sync between Squid and Quincy does not work in one direction - rgw - Ceph<https://tracker.ceph.com/issues/68819>. Best wishes, Alex ________________________________ From: Eugen Block <eblock@xxxxxx> Sent: Wednesday, November 6, 2024 11:35 AM To: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: [EXTERNAL] Re: Ceph Multisite Version Compatibility Hi Alex, I don't have a real good answer, just wanted to mention that one of our customers had some issues with multi-site when they were on the same major version (Octopus) but not on the same minor version. But it wasn't that the sync didn't work at all, it worked in general. Only from time to time, the sync status would show errors. And after updating the second site to the same minor version, the issues never came back. But from what I read in this list, it doesn't appear to be a general problem with version mismatch, so I wouldn't expect one sync direction to fail entirely. Maybe it's a configuration issue? Has it ever worked before or have you just set up the second site and it failed right from the start? Zitat von "Alex Hussein-Kershaw (HE/HIM)" <alexhus@xxxxxxxxxxxxx>: > I wondered if this applies: Ceph Releases (general) — Ceph > Documentation<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Freleases%2Fgeneral%2F&data=05%7C02%7Calexhus%40microsoft.com%7C62ba60cb463c4dd1ef7f08dcfe57410e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638664897954416415%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2Fbf7xrHmMZE%2BaOGYRsp9WZgVtmEdxi2iAxKdV1hMLSY%3D&reserved=0<https://docs.ceph.com/en/latest/releases/general/>>. > > "Online, rolling upgrade support and testing from the last two (2) > stable release(s) (starting from Luminous)." - which does imply I'm > doing something invalid with one site on Squid and one on Octopus. > > However, I've reproduced this between Quincy (17.2.7) and Squid > (19.2.0) now too, which according to the link above is a valid > upgrade path. To be clear, I have: > > * > SiteA (Quincy) < --- syncing --- > SiteB (Squid). > * > Write objects to SiteB, they appear on siteA shortly after. > * > Write objects to SiteA, they never appear on siteB. > > It seems to be 100% reproducible. Suspect I need to raise a tracker. > I welcome any suggestions that I'm doing this wrong meanwhile. > > ________________________________ > From: Alex Hussein-Kershaw (HE/HIM) > Sent: Friday, November 1, 2024 8:49 AM > To: ceph-users <ceph-users@xxxxxxx> > Subject: Ceph Multisite Version Compatibility > > Hi folks. I'm looking for some guidance on RGW multisite version > sync compatibility. Particularly between Octopus and Squid. Context > is I have two sites in a multisite pair replicating all S3 data. One > is on Squid, one is on Octopus. Should I expect the multisite sync > to just work between these versions? > > I'm observing that both sites "radosgw-admin sync status" reports > that we're in sync, but objects from the Octopus zone are not > replicated to the Squid zone (but the opposite direction is fine). > Might be this just isn't a valid setup, but failing to a reference > that claims something like "must be within +/- 1 version of the > other zones"). > > Thanks, > Alex > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx