Re: [EXTERNAL] Re: Ceph Multisite Version Compatibility

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I made some progress understanding this. It seems the RGW is aware that the sync is behind, despite not reporting it on "sync status".

$ radosgw-admin  sync status
          realm d2fa006d-7ced-423f-8510-9ac494c4f4ec (geored_realm)
      zonegroup 583c773c-b7e5-4e7f-a51e-c602237ec9c6 (geored_zg)
           zone 4bd83282-c7da-4dd9-9f18-d8d8d63b88c9 (siteA)
   current time 2024-11-06T18:03:03Z
zonegroup features enabled:
                   disabled: compress-encrypted,notification_v2,resharding
  metadata sync no sync (zone is master)
      data sync source: c2800277-80a5-4646-adff-99eae966c6fb (siteB)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

$ radosgw-admin  bucket sync --bucket ahkbucket   --source-zone siteB status
          realm d2fa006d-7ced-423f-8510-9ac494c4f4ec (geored_realm)
      zonegroup 583c773c-b7e5-4e7f-a51e-c602237ec9c6 (geored_zg)
           zone 4bd83282-c7da-4dd9-9f18-d8d8d63b88c9 (siteA)
         bucket :ahkbucket[4bd83282-c7da-4dd9-9f18-d8d8d63b88c9.184347.1])
   current time 2024-11-06T18:03:06Z

    source zone c2800277-80a5-4646-adff-99eae966c6fb (siteB)
  source bucket :ahkbucket[4bd83282-c7da-4dd9-9f18-d8d8d63b88c9.184347.1])
                incremental sync on 16 shards
                bucket is behind on 2 shards
                behind shards: [5,12]


Re-triggering a full sync with:

  *
 "radosgw-admin bucket sync --bucket ahkbucket init --source-zone siteB"
  *
 "radosgw-admin bucket sync --bucket ahkbucket run --source-zone siteB"

...restores all my un-synced objects. However the issue still persists after writing more objects. I think that implies that:

  *
full sync is working
  *
incremental sync is not


________________________________
From: Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx>
Sent: Wednesday, November 6, 2024 3:27 PM
To: Eugen Block <eblock@xxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: Re: [EXTERNAL]  Re: Ceph Multisite Version Compatibility

Hi Eugen,

Thanks for the suggestions. It has worked for me before. It's certainly possible it's a misconfiguration, however I've reproduced this on upgrade of some long lived systems that have been happily syncing away on Octopus for several years.

Definitely keen to understand if I'm missing something in the configuration. I am currently combing through network captures of the sync traffic trying to figure out the difference.

It's also odd that my "sync status" command is not reporting an error but is claiming we are in sync, despite that not being the case; my best guess at the moment is that the uplevel version is not correctly checking the downlevels logs.

I know a lot of refactoring of the multisite sync process has been done recently so I wonder if that may be related.

For reference, I raised this tracker: Bug #68819: rgw: multisite sync between Squid and Quincy does not work in one direction - rgw - Ceph<https://tracker.ceph.com/issues/68819>.

Best wishes,
Alex

________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Wednesday, November 6, 2024 11:35 AM
To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: [EXTERNAL]  Re: Ceph Multisite Version Compatibility

Hi Alex,

I don't have a real good answer, just wanted to mention that one of
our customers had some issues with multi-site when they were on the
same major version (Octopus) but not on the same minor version. But it
wasn't that the sync didn't work at all, it worked in general. Only
from time to time, the sync status would show errors. And after
updating the second site to the same minor version, the issues never
came back. But from what I read in this list, it doesn't appear to be
a general problem with version mismatch, so I wouldn't expect one sync
direction to fail entirely. Maybe it's a configuration issue? Has it
ever worked before or have you just set up the second site and it
failed right from the start?


Zitat von "Alex Hussein-Kershaw (HE/HIM)" <alexhus@xxxxxxxxxxxxx>:

> I wondered if this applies: Ceph Releases (general) — Ceph
> Documentation<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Freleases%2Fgeneral%2F&data=05%7C02%7Calexhus%40microsoft.com%7C62ba60cb463c4dd1ef7f08dcfe57410e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638664897954416415%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2Fbf7xrHmMZE%2BaOGYRsp9WZgVtmEdxi2iAxKdV1hMLSY%3D&reserved=0<https://docs.ceph.com/en/latest/releases/general/>>.
>
> "Online, rolling upgrade support and testing from the last two (2)
> stable release(s) (starting from Luminous)." - which does imply I'm
> doing something invalid with one site on Squid and one on Octopus.
>
> However, I've reproduced this between Quincy (17.2.7) and Squid
> (19.2.0) now too, which according to the link above is a valid
> upgrade path. To be clear, I have:
>
>   *
> SiteA (Quincy) < --- syncing --- > SiteB (Squid).
>   *
> Write objects to SiteB, they appear on siteA shortly after.
>   *
> Write objects to SiteA, they never appear on siteB.
>
> It seems to be 100% reproducible. Suspect I need to raise a tracker.
> I welcome any suggestions that I'm doing this wrong meanwhile.
>
> ________________________________
> From: Alex Hussein-Kershaw (HE/HIM)
> Sent: Friday, November 1, 2024 8:49 AM
> To: ceph-users <ceph-users@xxxxxxx>
> Subject: Ceph Multisite Version Compatibility
>
> Hi folks. I'm looking for some guidance on RGW multisite version
> sync compatibility. Particularly between Octopus and Squid. Context
> is I have two sites in a multisite pair replicating all S3 data. One
> is on Squid, one is on Octopus. Should I expect the multisite sync
> to just work between these versions?
>
> I'm observing that both sites "radosgw-admin sync status" reports
> that we're in sync, but objects from the Octopus zone are not
> replicated to the Squid zone (but the opposite direction is fine).
> Might be this just isn't a valid setup, but failing to a reference
> that claims something like "must be within +/- 1 version of the
> other zones").
>
> Thanks,
> Alex
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux