Hello, I'm testing multisite sync on reef 18.2.2, cephadm and ubuntu 22.04.Right now I'm testing symmetrical sync policy making backup to read-only zone. My sync policy allows for replication and I enable replication via put-bucket-replication.
My multisite setup fails at seemingly basic operation. My test looks like this: 1. create bucket 2. upload some data to bucket 3. wait for replication to copy some of the data4. run `rclone purge` on the bucket in master zone while replication is in progress. All data and bucket itself are deleted.
I've tested this on normal secondary zone and archive zone. It seems that bucket is deleted so quickly that replication gets stuck.Buckets are gone from both zones but data sync shard still tries to replicate them
Example of a recovering shard. { "shard_id": 100, "marker": { "status": "full-sync", "marker": "", "next_step_marker": "", "total_entries": 0, "pos": 0, "timestamp": "0.000000" }, "pending_buckets": [ "bucket6:58642236-4f66-46f5-b863-1d6a8667c4c3.61059.5:9", "bucket6:58642236-4f66-46f5-b863-1d6a8667c4c3.61059.7:9" ], "recovering_buckets": [ "bucket6:58642236-4f66-46f5-b863-1d6a8667c4c3.61059.7:9[0]" ], "current_time": "2024-07-17T13:23:11Z" }In this case there are 2 pending buckets because I've reused the bucket name.
The only semi-automatic solution I've found is to recreate bucket with the same name and wait for recovering shards to disappear.
Is there any way to make ceph clean up these stuck shards automatically? Best regards Adam Prycki
Attachment:
smime.p7s
Description: Kryptograficzna sygnatura S/MIME
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx