Hi Julian: Thanks for your reply. We are using tenant enabled RGW for now. :( I will try to Use CEPH 16 as secondary cluster to do the testing. If it works, I will upgrade master cluster upgrade to Ceph v16 too. Have a good day. Poß, Julian <julian.poss@xxxxxxx> 于2022年3月1日周二 17:42写道: > Hey, > > > > my cluster is only a test installation, to generally verify a rgw > multisite design, so there is no production data on it. > > Therefore my “solution” was to create a rgw s3 user without a tenant, so > instead of > > radosgw-admin user create –tenant=test --uid=test --display-name=test > --access_key=123 --secret_key=123 --rgw_realm internal > > > > I created a user like this: > radosgw-admin user create --uid=test2 --display-name=test2 > --access_key=1234 --secret_key=1234 --rgw_realm internal > > > > > > That worked for me, to verify the problem. Unfortunately this is most > likely not going to be a solution for you. > > And it isn’t for me either. But knowing this in my test-setup, I can take > precautions for the production installation, and install that with v16 > release, instead. > > I’ll probably verify that this is fixed in the latest v16 release, too, > before installing production clusters. > > > > Best, Julian > > > > *Von:* Te Mule <twl007@xxxxxxxxx> > *Gesendet:* Dienstag, 1. März 2022 10:26 > *An:* Poß, Julian <julian.poss@xxxxxxx> > *Cc:* Eugen Block <eblock@xxxxxx>; ceph-users@xxxxxxx > *Betreff:* Re: Multisite sync issue > > > > Hi Julian: > > > > Could you share your solution for this? We are also trying to find out a > solution for this. > > > > Thanks > > > > 在 2022年3月1日,下午5:18,Poß, Julian <julian.poss@xxxxxxx> 写道: > > > > Thanks a ton for pointing this out. > > Just verified this with a rgw user without tenant, works perfectly as you > would expect. > > I guess I could have suspected that tenants have something to do with it, > since I spotted issues with them in the past, too. > > Anyways, I got my “solution”. Thanks again! > > > > Best, Julian > > > > *Von:* Mule Te (TWL007) <twl007@xxxxxxxxx> > *Gesendet:* Freitag, 25. Februar 2022 19:45 > *An:* Poß, Julian <julian.poss@xxxxxxx> > *Cc:* Eugen Block <eblock@xxxxxx>; ceph-users@xxxxxxx > *Betreff:* Re: Multisite sync issue > > > > We have the same issue on Ceph 15.2.15. > > > > In the testing cluster, seem like Ceph 16 solved this issue. The PR > https://github.com/ceph/ceph/pull/41316 seem to remove this issue, but I > do not know why it does not merge back to Ceph 15. > > > > Also here is a new issue in Ceph tracker describes the same issue you > have: https://tracker.ceph.com/issues/53737 > > > > Thanks > > > > > On Feb 25, 2022, at 10:07 PM, Poß, Julian <julian.poss@xxxxxxx> wrote: > > > > As far as i can tell, it can be reproduced every time, yes. > > That statement was actually about two RGW in one zone. That is also > something that I tested. > Because I felt like ceph should be able to handle that ha-like on its own. > > But for the main issue, there is indeed only one rgw in each zone running. > Well as far as I can tell, I see no issues others than what I posted in my > initial mail. > > Best, Julian > > -----Ursprüngliche Nachricht----- > Von: Eugen Block <eblock@xxxxxx> > Gesendet: Freitag, 25. Februar 2022 12:57 > An: ceph-users@xxxxxxx > Betreff: Re: WG: Multisite sync issue > > I see, then I misread your statement about multiple RGWs: > > > > It also worries me that replication won't work with multiple rgws in > one zone, but one of them being unavailable, for instance during > maintenance. > > > Is there anything else than the RGW logs pointing to any issues? I find it > strange that after a restart of the RGW fixes it. Is this always > reproducable? > > Zitat von "Poß, Julian" <julian.poss@xxxxxxx>: > > > > Hi Eugen, > > there is currently only one RGW installed for each region+realm. > So the places to look at are already pretty much limited. > > As of now, the RGWs itself are the endpoints. So far no loadbalancer > has been put into place there. > > Best, Julian > > -----Ursprüngliche Nachricht----- > Von: Eugen Block <eblock@xxxxxx> > Gesendet: Freitag, 25. Februar 2022 10:52 > An: ceph-users@xxxxxxx > Betreff: Re: WG: Multisite sync issue > > This email originated from outside of CGM. Please do not click links > or open attachments unless you know the sender and know the content is > safe. > > > Hi, > > I would stop alle RGWs except one in each cluster to limit the places > and logs to look at. Do you have a loadbalancer as endpoint or do you > have a list of all RGWs as endpoints? > > > Zitat von "Poß, Julian" <julian.poss@xxxxxxx>: > > > > Hi, > > i did setup multisite with 2 ceph clusters and multiple rgw's and > realms/zonegroups. > This setup was installed using ceph ansible branch "stable-5.0", with > focal+octopus. > During some testing, i noticed that somehow the replication seems to > not work as expected. > > With s3cmd, i put a small file of 1.9 kb into a bucket on the master > zone s3cmd put /etc/hosts s3://test/ > > Then i can see at the output of "radosgw-admin sync status > --rgw_realm internal", that the cluster has indeed to sync something, > and switching back to "nothing to sync" after a couple of seconds. > "radosgw-admin sync error list --rgw_realm internal" is emtpy, too. > However, if i look via s3cmd on the secondary zone, i can't see the > file. Even if i look at the ceph pools directly, the data didn't get > replicated. > If i proceed by uploading the file again, with the same command and > without a change, basically just updating it, or by restarting rgw > deamon of the secondary zone, the affected file gets replicated. > > I spotted this issue with all my realms/zonegroups. But even with > "debug_rgw = 20" and debug_rgw_sync = "20" i can't spot any obvious > errors in the logs. > > It also worries me that replication won't work with multiple rgws in > one zone, but one of them being unavailable, for instance during > maintenance. > I did somehow expect ceph to work it's way though the list of > available endpoints, and only fail if none are available. > ...Or am I missing something here? > > Any help whatsoever is very much appreciated. > I am pretty new to multisite and stuck on this for a couple of days > now already. > > Thanks, Julian > > > Here is some additional information, including some log snippets: > > # ON Master site, i can see the file in the bilog right away > radosgw-admin bilog list --bucket test/test --rgw_realm internal > { > "op_id": "3#00000000001.445.5", > "op_tag": "b9794e07-8f6c-4c45-a981-a73c3a4dc863.8360.106", > "op": "write", > "object": "hosts", > "instance": "", > "state": "complete", > "index_ver": 1, > "timestamp": "2022-02-24T09:14:41.957638774Z", > "ver": { > "pool": 7, > "epoch": 2 > }, > "bilog_flags": 0, > "versioned": false, > "owner": "", > "owner_display_name": "", > "zones_trace": [ > { > "entry": > > "b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3" > } > ] > }, > > > # RGW log of secondary zone shows the sync attempt: > 2022-02-24T09:14:52.502+0000 7f1419ff3700 0 > RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981- > a > 73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard > test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3 > > # but the secondary zone, doesnt actually show the new file in the > bilog radosgw-admin bilog list --bucket test/test --rgw_realm > internal > > # and the shard log that according to the logfile had the data to > sync in it, doesn't seem to even exist at the secondary zone > radosgw-admin datalog list --shard-id 72 --rgw_realm internal > ERROR: list_bi_log_entries(): (2) No such file or directory > > > # RGW Log at master zone, there is one 404 in there which worries me > a bit > 2022-02-24T09:14:52.515+0000 7ff5816e2700 1 beast: 0x7ff6387f77c0: > 192.168.85.71 - - [2022-02-24T09:14:52.515949+0000] "GET > /admin/log/?type=bucket-index&bucket-instance=test%2Ftest%3Ab9794e07- > 8 > f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&info&rgwx-zonegroup=7d35d818-08 > 8 > 1-483a-b1bf-47ec21f26609 HTTP/1.1" 200 94 - - > - > 2022-02-24T09:14:52.527+0000 7ff512604700 1 beast: 0x7ff6386747c0: > 192.168.85.71 - - [2022-02-24T09:14:52.527950+0000] "GET > /test?rgwx-bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c > 3 > a4dc863.8366.3%3A3&versions&format=json&objs-container=true&key-marke > r > &version-id-marker&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f2660 > 9 > HTTP/1.1" 404 146 - - > - > 2022-02-24T09:14:52.535+0000 7ff559e93700 1 beast: 0x7ff6386747c0: > 192.168.85.71 - - [2022-02-24T09:14:52.535950+0000] "GET > /admin/log?bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c > 3 > a4dc863.8366.3%3A3&format=json&marker=00000000001.445.5&type=bucket-i > n > dex&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609 HTTP/1.1" 200 > 2 - - > - > > > > # if i update the file, by reuploading it, or restart the rgw deamon > of the secondary zone, the affected file gets synced s3cmd put > /etc/hosts s3://test/ > > # again, there is the sync attempt from the secondary zone rgw > 2022-02-24T12:04:52.452+0000 7f1419ff3700 0 > RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981- > a > 73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard > test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3 > > # But now the file does show in bilog, and data log radosgw-admin > bilog list --bucket test/test --rgw_realm internal { > "op_id": "3#00000000001.456.5", > "op_tag": "_e1zRfGuaFH7mLumu1gapeLzHo9zYU6M", > "op": "write", > "object": "hosts", > "instance": "", > "state": "complete", > "index_ver": 1, > "timestamp": "2022-02-24T12:04:38.405141253Z", > "ver": { > "pool": 7, > "epoch": 2 > }, > "bilog_flags": 0, > "versioned": false, > "owner": "", > "owner_display_name": "", > "zones_trace": [ > { > "entry": > > "b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3" > }, > { > "entry": > > "e00c182b-27dc-4500-ad5b-77719f615d76:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3" > }, > { > "entry": > > "e00c182b-27dc-4500-ad5b-77719f615d76:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3" > } > ] > } > > radosgw-admin datalog list --shard-id 72 --rgw_realm internal > { > "entity_type": "bucket", > "key": "test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3", > "timestamp": "2022-02-24T12:04:52.522177Z" > } > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx