We have the same issue on Ceph 15.2.15. In the testing cluster, seem like Ceph 16 solved this issue. The PR https://github.com/ceph/ceph/pull/41316 <https://github.com/ceph/ceph/pull/41316> seem to remove this issue, but I do not know why it does not merge back to Ceph 15. Also here is a new issue in Ceph tracker describes the same issue you have: https://tracker.ceph.com/issues/53737 <https://tracker.ceph.com/issues/53737> Thanks > On Feb 25, 2022, at 10:07 PM, Poß, Julian <julian.poss@xxxxxxx> wrote: > > As far as i can tell, it can be reproduced every time, yes. > > That statement was actually about two RGW in one zone. That is also something that I tested. > Because I felt like ceph should be able to handle that ha-like on its own. > > But for the main issue, there is indeed only one rgw in each zone running. > Well as far as I can tell, I see no issues others than what I posted in my initial mail. > > Best, Julian > > -----Ursprüngliche Nachricht----- > Von: Eugen Block <eblock@xxxxxx> > Gesendet: Freitag, 25. Februar 2022 12:57 > An: ceph-users@xxxxxxx > Betreff: Re: WG: Multisite sync issue > > I see, then I misread your statement about multiple RGWs: > >> It also worries me that replication won't work with multiple rgws in >> one zone, but one of them being unavailable, for instance during >> maintenance. > > Is there anything else than the RGW logs pointing to any issues? I find it strange that after a restart of the RGW fixes it. Is this always reproducable? > > Zitat von "Poß, Julian" <julian.poss@xxxxxxx>: > >> Hi Eugen, >> >> there is currently only one RGW installed for each region+realm. >> So the places to look at are already pretty much limited. >> >> As of now, the RGWs itself are the endpoints. So far no loadbalancer >> has been put into place there. >> >> Best, Julian >> >> -----Ursprüngliche Nachricht----- >> Von: Eugen Block <eblock@xxxxxx> >> Gesendet: Freitag, 25. Februar 2022 10:52 >> An: ceph-users@xxxxxxx >> Betreff: Re: WG: Multisite sync issue >> >> This email originated from outside of CGM. Please do not click links >> or open attachments unless you know the sender and know the content is >> safe. >> >> >> Hi, >> >> I would stop alle RGWs except one in each cluster to limit the places >> and logs to look at. Do you have a loadbalancer as endpoint or do you >> have a list of all RGWs as endpoints? >> >> >> Zitat von "Poß, Julian" <julian.poss@xxxxxxx>: >> >>> Hi, >>> >>> i did setup multisite with 2 ceph clusters and multiple rgw's and >>> realms/zonegroups. >>> This setup was installed using ceph ansible branch "stable-5.0", with >>> focal+octopus. >>> During some testing, i noticed that somehow the replication seems to >>> not work as expected. >>> >>> With s3cmd, i put a small file of 1.9 kb into a bucket on the master >>> zone s3cmd put /etc/hosts s3://test/ >>> >>> Then i can see at the output of "radosgw-admin sync status >>> --rgw_realm internal", that the cluster has indeed to sync something, >>> and switching back to "nothing to sync" after a couple of seconds. >>> "radosgw-admin sync error list --rgw_realm internal" is emtpy, too. >>> However, if i look via s3cmd on the secondary zone, i can't see the >>> file. Even if i look at the ceph pools directly, the data didn't get >>> replicated. >>> If i proceed by uploading the file again, with the same command and >>> without a change, basically just updating it, or by restarting rgw >>> deamon of the secondary zone, the affected file gets replicated. >>> >>> I spotted this issue with all my realms/zonegroups. But even with >>> "debug_rgw = 20" and debug_rgw_sync = "20" i can't spot any obvious >>> errors in the logs. >>> >>> It also worries me that replication won't work with multiple rgws in >>> one zone, but one of them being unavailable, for instance during >>> maintenance. >>> I did somehow expect ceph to work it's way though the list of >>> available endpoints, and only fail if none are available. >>> ...Or am I missing something here? >>> >>> Any help whatsoever is very much appreciated. >>> I am pretty new to multisite and stuck on this for a couple of days >>> now already. >>> >>> Thanks, Julian >>> >>> >>> Here is some additional information, including some log snippets: >>> >>> # ON Master site, i can see the file in the bilog right away >>> radosgw-admin bilog list --bucket test/test --rgw_realm internal >>> { >>> "op_id": "3#00000000001.445.5", >>> "op_tag": "b9794e07-8f6c-4c45-a981-a73c3a4dc863.8360.106", >>> "op": "write", >>> "object": "hosts", >>> "instance": "", >>> "state": "complete", >>> "index_ver": 1, >>> "timestamp": "2022-02-24T09:14:41.957638774Z", >>> "ver": { >>> "pool": 7, >>> "epoch": 2 >>> }, >>> "bilog_flags": 0, >>> "versioned": false, >>> "owner": "", >>> "owner_display_name": "", >>> "zones_trace": [ >>> { >>> "entry": >>> "b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3" >>> } >>> ] >>> }, >>> >>> >>> # RGW log of secondary zone shows the sync attempt: >>> 2022-02-24T09:14:52.502+0000 7f1419ff3700 0 >>> RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981- >>> a >>> 73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard >>> test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3 >>> >>> # but the secondary zone, doesnt actually show the new file in the >>> bilog radosgw-admin bilog list --bucket test/test --rgw_realm >>> internal >>> >>> # and the shard log that according to the logfile had the data to >>> sync in it, doesn't seem to even exist at the secondary zone >>> radosgw-admin datalog list --shard-id 72 --rgw_realm internal >>> ERROR: list_bi_log_entries(): (2) No such file or directory >>> >>> >>> # RGW Log at master zone, there is one 404 in there which worries me >>> a bit >>> 2022-02-24T09:14:52.515+0000 7ff5816e2700 1 beast: 0x7ff6387f77c0: >>> 192.168.85.71 - - [2022-02-24T09:14:52.515949+0000] "GET >>> /admin/log/?type=bucket-index&bucket-instance=test%2Ftest%3Ab9794e07- >>> 8 >>> f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&info&rgwx-zonegroup=7d35d818-08 >>> 8 >>> 1-483a-b1bf-47ec21f26609 HTTP/1.1" 200 94 - - >>> - >>> 2022-02-24T09:14:52.527+0000 7ff512604700 1 beast: 0x7ff6386747c0: >>> 192.168.85.71 - - [2022-02-24T09:14:52.527950+0000] "GET >>> /test?rgwx-bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c >>> 3 >>> a4dc863.8366.3%3A3&versions&format=json&objs-container=true&key-marke >>> r >>> &version-id-marker&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f2660 >>> 9 >>> HTTP/1.1" 404 146 - - >>> - >>> 2022-02-24T09:14:52.535+0000 7ff559e93700 1 beast: 0x7ff6386747c0: >>> 192.168.85.71 - - [2022-02-24T09:14:52.535950+0000] "GET >>> /admin/log?bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c >>> 3 >>> a4dc863.8366.3%3A3&format=json&marker=00000000001.445.5&type=bucket-i >>> n >>> dex&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609 HTTP/1.1" 200 >>> 2 - - >>> - >>> >>> >>> >>> # if i update the file, by reuploading it, or restart the rgw deamon >>> of the secondary zone, the affected file gets synced s3cmd put >>> /etc/hosts s3://test/ >>> >>> # again, there is the sync attempt from the secondary zone rgw >>> 2022-02-24T12:04:52.452+0000 7f1419ff3700 0 >>> RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981- >>> a >>> 73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard >>> test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3 >>> >>> # But now the file does show in bilog, and data log radosgw-admin >>> bilog list --bucket test/test --rgw_realm internal { >>> "op_id": "3#00000000001.456.5", >>> "op_tag": "_e1zRfGuaFH7mLumu1gapeLzHo9zYU6M", >>> "op": "write", >>> "object": "hosts", >>> "instance": "", >>> "state": "complete", >>> "index_ver": 1, >>> "timestamp": "2022-02-24T12:04:38.405141253Z", >>> "ver": { >>> "pool": 7, >>> "epoch": 2 >>> }, >>> "bilog_flags": 0, >>> "versioned": false, >>> "owner": "", >>> "owner_display_name": "", >>> "zones_trace": [ >>> { >>> "entry": >>> "b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3" >>> }, >>> { >>> "entry": >>> "e00c182b-27dc-4500-ad5b-77719f615d76:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3" >>> }, >>> { >>> "entry": >>> "e00c182b-27dc-4500-ad5b-77719f615d76:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3" >>> } >>> ] >>> } >>> >>> radosgw-admin datalog list --shard-id 72 --rgw_realm internal >>> { >>> "entity_type": "bucket", >>> "key": "test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3", >>> "timestamp": "2022-02-24T12:04:52.522177Z" >>> } >> >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an >> email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx