Re: bucket-granularity sync: how to implement ExistingOjbectReplication in S3 API ?

Casey Bodley <cbodley@xxxxxxxxxx> · Mon, 18 Apr 2022 13:27:13 -0400

(cc Yehuda and dev list)

On Fri, Apr 15, 2022 at 5:33 AM Edgelong Voodu <1070443499cs@xxxxxxxxx> wrote:
>
> hi, Casey:
>    I want to implement the ExistingObjectReplication which has not been implemented yet in rgw-multiste bucket-granularity  sync currently. (see https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/API/API_ExistingObjectReplication.html)

cool! the existing behavior corresponds to
ExistingObjectReplication=Enabled, so this discussion is about adding
the Disabled case

>    There is a key questions about this feature, that is how can i identify which object created before or after PUT the bucket replication configuration?
> there are some of my thoughts:
> 1) because of the clock skew, i don't think it is a good idea to compare the m_time between the object and replication configuration. If we compare the m_time, maybe will miss some object sync or sync the wrong m_time object.
> 2) some old bi log may be trimmed, so not every object has it's own bilog entry correspond, we can get the latest marker of bilog when execute PutBucketReplication, but what about the rest object (no bilog marker for them)?
>
> would you provide some advise or any idea about this feature?
>  thank you .
>

this sounds complicated for two main reasons:

* the consistency model for metadata and data are completely separate.
so if bucket sync needs to look at a timestamp in its bucket metadata,
it has no way to know whether that's the *latest* version of the
bucket metadata. i think this is an issue with bucket replication
policy in general

* the interaction between 'bucket full sync', 'bucket incremental
sync', and bilog trimming. as you said in 2) above, we may trim bilogs
that other zones haven't processed yet, because we assume those
changes would be covered by a 'bucket full sync'

disabling ExistingObjectReplication sounds a lot like skipping the
'bucket full sync' step. but for this to work correctly, a) 'bucket
incremental sync' would need to know where in the bilogs to start so
that it only sees the events that happened after the 'disable', and b)
we'd need to prevent those bilog entries from being trimmed

for 'a)', the metadata master zone handling the PutBucketReplication
op could record its own bilog markers, but it doesn't know the current
markers on other zones - and active-active sync would require those
markers too. for 'b)', it's probably not desirable to leave untrimmed
entries around like this

in the end, it may be better to keep the existing structure of bucket
full/incremental sync, but filter everything based on mtime as you
suggest in '1)' above. that may not be perfect in the presence of time
skew, but skew is already a factor in sync - all we can promise is
that every zone would make the same decisions and end up with the same
result

we'd also need to consider what happens when PutBucketReplication
changes the value of ExistingObjectReplication after other zones have
made it through full sync. if it changes from Disabled->Enabled, each
zone would have to restart a 'bucket full sync' to catch anything it
missed last time. there's some precedent for this (restarting a bucket
full sync) in `radosgw-admin bucket sync enable`, but that's built
into data sync itself. i don't think there's a good way for metadata
sync to trigger that from the outside

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx