We have encountered replication issues in our multisite settings with Quincy v17.2.3. Our Ceph clusters are brand new. We tore down our clusters and re-deployed fresh Quincy ones before we did our test. In our environment, we have 3 RGW nodes per site, each node has 2 instances for client traffic and 1 instance dedicated for replication. Our test was done using cosbench with the following settings: - 10 rgw users - 3000 buckets per user - write only - 6 different object sizes with the following distribution: 1k: 17% 2k: 48% 3k: 14% 4k: 5% 1M: 13% 8M: 3% - trying to write 10 million objects per object size bucket per user to avoid writing to the same objects - no multipart uploads involved The test ran for about 2 hours roughly from 22:50pm 9/14 to 1:00am 9/15. And after that, the replication tail continued for another roughly 4 hours till 4:50am 9/15 with gradually decreasing replication traffic. Then the replication stopped and nothing has been going on in the clusters since. While we were verifying the replication status, we found many issues. 1. The sync status shows the clusters are not fully synced. However all the replication traffic has stopped and nothing is going on in the clusters. Secondary zone: realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm) zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup) zone 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 2 shards behind shards: [40,74] Why the replication stopped even though the clusters are still not in-sync? 2. We can see some buckets are not fully synced and we are able to identified some missing objects in our secondary zone. Here is an example bucket. This is its sync status in the secondary zone. realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm) zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup) zone 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt) bucket :mixed-5wrks-dev-4k-thisisbcstestload004178[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89152.78]) source zone b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw) source bucket :mixed-5wrks-dev-4k-thisisbcstestload004178[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89152.78]) full sync: 0/101 shards incremental sync: 100/101 shards bucket is behind on 1 shards behind shards: [78] 3. We can see from the above sync status, the behind shard for the example bucket is not in the list of the behind shards for the system sync status. Why is that? 4. Data sync status for these behind shards doesn't list any "pending_buckets" or "recovering_buckets". An example: { "shard_id": 74, "marker": { "status": "incremental-sync", "marker": "00000000000000000003:00000000000003381964", "next_step_marker": "", "total_entries": 0, "pos": 0, "timestamp": "2022-09-15T00:00:08.718840Z" }, "pending_buckets": [], "recovering_buckets": [] } Shouldn't the not-yet-in-sync buckets be listed here? 5. The sync status of the primary zone is different from the sync status of the secondary zone with different groups of behind shards. The same for the sync status of the same bucket. Is it legitimate? Please see the item 1 for sync status of the secondary zone, and the item 6 for the primary zone. 6. Why the primary zone has behind shards anyway since the replication is from primary to the secondary?| Primary Zone: realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm) zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup) zone b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw) metadata sync no sync (zone is master) data sync source: 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 30 shards behind shards: [6,7,26,28,29,37,47,52,55,56,61,67,68,69,74,79,82,91,95,99,101,104,106,111,112,121,122,123,126,127] 7. We have buckets in-sync that show correct sync status in secondary zone but still show behind shards in primary. Why is that? Secondary Zone: realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm) zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup) zone 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt) bucket :mixed-5wrks-dev-4k-thisisbcstestload008167[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89754.279]) source zone b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw) source bucket :mixed-5wrks-dev-4k-thisisbcstestload008167[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89754.279]) full sync: 0/101 shards incremental sync: 99/101 shards bucket is caught up with source Primary zone: realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm) zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup) zone b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw) bucket :mixed-5wrks-dev-4k-thisisbcstestload008167[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89754.279]) source zone 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt) source bucket :mixed-5wrks-dev-4k-thisisbcstestload008167[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89754.279]) full sync: 0/101 shards incremental sync: 97/101 shards bucket is behind on 11 shards behind shards: [9,11,14,16,22,31,44,45,67,85,90] Our primary goals here are: 1. to find out why the replication stopped while the clusters are not in-sync; 2. to understand what we need to do resume the replication, and to make sure it runs to the end without too much lagging; 3. to understand if all the sync status info is correct. Seems to us there are many conflicts, and some doesn't reflect the real status of the clusters at all. I have opened an issue in the Issue Tracker: https://tracker.ceph.com/issues/57562. And more info regarding our clusters has been attached to the issue. It includes the following: - ceph.conf of rgws - ceph config dump - ceph versions output - sync status of cluster, an in-sync bucket, a not-in-sync bucket, and some behind shards - bucket list and bucket stats of a not-in-sync bucket and stat of a not-in-sync object Thanks, Jane _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx