Re: WG: Multisite sync issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eugen,

there is currently only one RGW installed for each region+realm.
So the places to look at are already pretty much limited.

As of now, the RGWs itself are the endpoints. So far no loadbalancer has been put into place there.

Best, Julian

-----Ursprüngliche Nachricht-----
Von: Eugen Block <eblock@xxxxxx> 
Gesendet: Freitag, 25. Februar 2022 10:52
An: ceph-users@xxxxxxx
Betreff: [ceph-users] Re: WG: Multisite sync issue

This email originated from outside of CGM. Please do not click links or open attachments unless you know the sender and know the content is safe.


Hi,

I would stop alle RGWs except one in each cluster to limit the places and logs to look at. Do you have a loadbalancer as endpoint or do you have a list of all RGWs as endpoints?


Zitat von "Poß, Julian" <julian.poss@xxxxxxx>:

> Hi,
>
> i did setup multisite with 2 ceph clusters and multiple rgw's and 
> realms/zonegroups.
> This setup was installed using ceph ansible branch "stable-5.0", with 
> focal+octopus.
> During some testing, i noticed that somehow the replication seems to 
> not work as expected.
>
> With s3cmd, i put a small file of 1.9 kb into a bucket on the master 
> zone s3cmd put /etc/hosts s3://test/
>
> Then i can see at the output of "radosgw-admin sync status --rgw_realm 
> internal", that the cluster has indeed to sync something, and 
> switching back to "nothing to sync" after a couple of seconds.
> "radosgw-admin sync error list --rgw_realm internal" is emtpy, too.
> However, if i look via s3cmd on the secondary zone, i can't see the 
> file. Even if i look at the ceph pools directly, the data didn't get 
> replicated.
> If i proceed by uploading the file again, with the same command and 
> without a change, basically just updating it, or by restarting rgw 
> deamon of the secondary zone, the affected file gets replicated.
>
> I spotted this issue with all my realms/zonegroups. But even with 
> "debug_rgw = 20" and debug_rgw_sync = "20" i can't spot any obvious 
> errors in the logs.
>
> It also worries me that replication won't work with multiple rgws in 
> one zone, but one of them being unavailable, for instance during 
> maintenance.
> I did somehow expect ceph to work it's way though the list of 
> available endpoints, and only fail if none are available.
> ...Or am I missing something here?
>
> Any help whatsoever is very much appreciated.
> I am pretty new to multisite and stuck on this for a couple of days 
> now already.
>
> Thanks, Julian
>
>
> Here is some additional information, including some log snippets:
>
> # ON Master site, i can see the file in the bilog right away 
> radosgw-admin bilog list --bucket test/test --rgw_realm internal
>                 {
>         "op_id": "3#00000000001.445.5",
>         "op_tag": "b9794e07-8f6c-4c45-a981-a73c3a4dc863.8360.106",
>         "op": "write",
>         "object": "hosts",
>         "instance": "",
>         "state": "complete",
>         "index_ver": 1,
>         "timestamp": "2022-02-24T09:14:41.957638774Z",
>         "ver": {
>             "pool": 7,
>             "epoch": 2
>         },
>         "bilog_flags": 0,
>         "versioned": false,
>         "owner": "",
>         "owner_display_name": "",
>         "zones_trace": [
>             {
>                 "entry":
> "b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3"
>             }
>         ]
>     },
>
>
> # RGW log of secondary zone shows the sync attempt:
> 2022-02-24T09:14:52.502+0000 7f1419ff3700  0
> RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981-a
> 73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard
> test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3
>
> # but the secondary zone, doesnt actually show the new file in the 
> bilog radosgw-admin bilog list --bucket test/test --rgw_realm internal
>
> # and the shard log that according to the logfile had the data to sync 
> in it, doesn't seem to even exist at the secondary zone radosgw-admin 
> datalog list --shard-id 72 --rgw_realm internal
> ERROR: list_bi_log_entries(): (2) No such file or directory
>
>
> # RGW Log at master zone, there is one 404 in there which worries me a 
> bit
> 2022-02-24T09:14:52.515+0000 7ff5816e2700  1 beast: 0x7ff6387f77c0:
> 192.168.85.71 - - [2022-02-24T09:14:52.515949+0000] "GET
> /admin/log/?type=bucket-index&bucket-instance=test%2Ftest%3Ab9794e07-8
> f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&info&rgwx-zonegroup=7d35d818-088
> 1-483a-b1bf-47ec21f26609 HTTP/1.1" 200 94 - -
> -
> 2022-02-24T09:14:52.527+0000 7ff512604700  1 beast: 0x7ff6386747c0:
> 192.168.85.71 - - [2022-02-24T09:14:52.527950+0000] "GET
> /test?rgwx-bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3
> a4dc863.8366.3%3A3&versions&format=json&objs-container=true&key-marker
> &version-id-marker&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609 
> HTTP/1.1" 404 146 - -
> -
> 2022-02-24T09:14:52.535+0000 7ff559e93700  1 beast: 0x7ff6386747c0:
> 192.168.85.71 - - [2022-02-24T09:14:52.535950+0000] "GET
> /admin/log?bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3
> a4dc863.8366.3%3A3&format=json&marker=00000000001.445.5&type=bucket-in
> dex&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609 HTTP/1.1" 200 
> 2 - -
> -
>
>
>
> # if i update the file, by reuploading it, or restart the rgw deamon 
> of the secondary zone, the affected file gets synced s3cmd put 
> /etc/hosts s3://test/
>
> # again, there is the sync attempt from the secondary zone rgw
> 2022-02-24T12:04:52.452+0000 7f1419ff3700  0
> RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981-a
> 73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard
> test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3
>
> # But now the file does show in bilog, and data log radosgw-admin 
> bilog list --bucket test/test --rgw_realm internal {
>         "op_id": "3#00000000001.456.5",
>         "op_tag": "_e1zRfGuaFH7mLumu1gapeLzHo9zYU6M",
>         "op": "write",
>         "object": "hosts",
>         "instance": "",
>         "state": "complete",
>         "index_ver": 1,
>         "timestamp": "2022-02-24T12:04:38.405141253Z",
>         "ver": {
>             "pool": 7,
>             "epoch": 2
>         },
>         "bilog_flags": 0,
>         "versioned": false,
>         "owner": "",
>         "owner_display_name": "",
>         "zones_trace": [
>             {
>                 "entry":
> "b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3"
>             },
>             {
>                 "entry":
> "e00c182b-27dc-4500-ad5b-77719f615d76:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3"
>             },
>             {
>                 "entry":
> "e00c182b-27dc-4500-ad5b-77719f615d76:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3"
>             }
>         ]
>     }
>
> radosgw-admin datalog list --shard-id 72 --rgw_realm internal
>     {
>         "entity_type": "bucket",
>         "key": "test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3",
>         "timestamp": "2022-02-24T12:04:52.522177Z"
>     }



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux