Question about multi-site sync policies

Ulrich Klein <Ulrich.Klein@xxxxxxxxxxxxxx> · Thu, 6 Jan 2022 12:45:13 +0100

Hi,

My first question on this list …. 2nd attempt because the first one didn’t make it (I hope)?

I'm trying out RGW multi-site sync policies.

I have a test/PoC setup with 16.2.6 using cephadm (which, by the way, I DO like)
I only use RGW/S3 
There is one realm (myrealm), one zonegroup (myzg) and 3 zones: Zone A, zone B and zone DR

Zone A and B are on the same ceph instance with different rgw processes and pools, zone DR is on a second cluster, in case that matters.
Without any sync policies all data is synced perfectly fine between all zones

What I want to see is:
Data from zone A is synced directionally to zone DR, but not to zone B
Data from zone B is not synced anywhere
Data from zone DR is not synced back/anywhere
i.e. clients writing to zone A get their data "backed up" to zone DR, 
while clients writng to zone B don't get their data "backed" up to a second zone.
Clients don't have access to zone DR.

I used sync policies this way:
radosgw-admin sync group create --group-id=drsync --status=allowed
radosgw-admin sync group flow create --group-id=drsync --flow-id=a2dr --flow-type=directional --source-zone=a --dest-zone=dr
radosgw-admin sync group pipe create --group-id=drsync --pipe-id=allbuck --source-zones='*' --source-bucket='*' --dest-zones='*' --dest-bucket='*'
radosgw-admin sync group modify --group-id=drsync --status=enabled
radosgw-admin period update --commit

Now, from a data-in-buckets perspective all looks fine
Add data to bucket buck on A -> appears on buck on DR, but not in B
Add data to bucket buck on B -> doesn't appear in A or DR
Same for all other combinations, just as I want it.

BUT
radosgw-admin sync status 
with --rgw-zone=A or DR or B
after adding or removing data always shows some
"data is behind on X shards", apparently for all the shards that are - intentionally - not synced. Those “behind” shards accumulate over time and never go away again.

Is that just annoying but normal? Or is that a bug?
Or is my configuration just "bad" and could be changed so I don't have those sort-of errors with the sync status.

When I used bucket level sync policies (only sync certain buckets from zone A to DR, no zone B) I had iirc the same effect.

What I'm really trying to achieve is something like "user sync policies", i.e.
- User X data should be synced from A to DR
- User Y data should only stay in A
I’m trying to emulate that by using the existing/documented sync policies. User X gets URL for zone A, user Y gets URL for zone B
(or with bucket level policies both get URL for zone A and for user X syncing is turned on “on demand” for certain existing buckets - inconvenient)

Best would be if I could flip a flag and change the behaviour per user :)
And even better if it was easy to understand, i.e. if user's sync-flag is turned on all his data is synced, not just new data. If user's sync-flag is turned off, all his data is removed from the DR zone.

Thanks for any input :)

Ciao, Uli
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx