rgw multisite excessive data usage on secondary zone

Adam Prycki <aprycki@xxxxxxxxxxxxx> · Thu, 28 Nov 2024 16:46:44 +0100

Hi,

I've just configured a second zones for 2 of our ceph s3 deployments and 
I've noticed that after initial sync secondary zone data pool are much 
bigger than ones on master zones.

My setup consists of main zone, archive zone and sync_policy which 
configure directional sync from main zone to archive.

here is an example from the zone I've configured today.

Master zone looks like this
X.rgw.buckets.data                          42  1024  361 GiB  254.26k 
542 GiB   0.01    3.1 PiB

secondary archive zone looks like this.
X-archive.rgw.buckets.data            32    32  755 GiB  716.68k  1007 
GiB   0.03    2.4 PiB

This archive zone was created few hours ago. Users didn't overwrite so 
much data to double archive zone size. (object count is almost trippled)

I've checked gc list --include-all on archive zones and it's empty. I'm 
not sure why zone is this big.

Few days ago I've also configured archive zone for different deployment. 
I've configured archive-zone lifecycle policy to 1 day and tried to 
cleanup all the buckets on archive zone. It didn't help.
My other archive zone is 150% of the size of it's master zone.
I've tried to force sync with `radosgw-admin data sync init` Sync worked 
but didn't help with excess data on the pool.

I suspect it's an error during initial multisite synchronization. I've 
restarted RGW daemon on archive zone during initial synchronization in 
both cases.

What else could have caused this?
Are RGW daemons in multisite setups sensitive to restarts?
Could similar issues happen during normal rgw restart during multisite 
operations?

Best regards
Adam Prycki

Attachment:
smime.p7s

Description: Kryptograficzna sygnatura S/MIME
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx