Re: bucket-granularity sync: how to implement ExistingOjbectReplication in S3 API ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 18, 2022 at 1:27 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
>
> (cc Yehuda and dev list)
>
> On Fri, Apr 15, 2022 at 5:33 AM Edgelong Voodu <1070443499cs@xxxxxxxxx> wrote:
> >
> > hi, Casey:
> >    I want to implement the ExistingObjectReplication which has not been implemented yet in rgw-multiste bucket-granularity  sync currently. (see https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/API/API_ExistingObjectReplication.html)
>
> cool! the existing behavior corresponds to
> ExistingObjectReplication=Enabled, so this discussion is about adding
> the Disabled case
>
> >    There is a key questions about this feature, that is how can i identify which object created before or after PUT the bucket replication configuration?
> > there are some of my thoughts:
> > 1) because of the clock skew, i don't think it is a good idea to compare the m_time between the object and replication configuration. If we compare the m_time, maybe will miss some object sync or sync the wrong m_time object.
> > 2) some old bi log may be trimmed, so not every object has it's own bilog entry correspond, we can get the latest marker of bilog when execute PutBucketReplication, but what about the rest object (no bilog marker for them)?
> >
> > would you provide some advise or any idea about this feature?
> >  thank you .
> >
>
> this sounds complicated for two main reasons:
>
> * the consistency model for metadata and data are completely separate.
> so if bucket sync needs to look at a timestamp in its bucket metadata,
> it has no way to know whether that's the *latest* version of the
> bucket metadata. i think this is an issue with bucket replication
> policy in general
>
> * the interaction between 'bucket full sync', 'bucket incremental
> sync', and bilog trimming. as you said in 2) above, we may trim bilogs
> that other zones haven't processed yet, because we assume those
> changes would be covered by a 'bucket full sync'
>
> disabling ExistingObjectReplication sounds a lot like skipping the
> 'bucket full sync' step. but for this to work correctly, a) 'bucket
> incremental sync' would need to know where in the bilogs to start so
> that it only sees the events that happened after the 'disable', and b)
> we'd need to prevent those bilog entries from being trimmed
>
> for 'a)', the metadata master zone handling the PutBucketReplication
> op could record its own bilog markers, but it doesn't know the current
> markers on other zones - and active-active sync would require those
> markers too. for 'b)', it's probably not desirable to leave untrimmed
> entries around like this
>
> in the end, it may be better to keep the existing structure of bucket
> full/incremental sync, but filter everything based on mtime as you
> suggest in '1)' above. that may not be perfect in the presence of time
> skew, but skew is already a factor in sync - all we can promise is
> that every zone would make the same decisions and end up with the same
> result
>
> we'd also need to consider what happens when PutBucketReplication
> changes the value of ExistingObjectReplication after other zones have
> made it through full sync. if it changes from Disabled->Enabled, each
> zone would have to restart a 'bucket full sync' to catch anything it
> missed last time. there's some precedent for this (restarting a bucket
> full sync) in `radosgw-admin bucket sync enable`, but that's built
> into data sync itself. i don't think there's a good way for metadata
> sync to trigger that from the outside
>

This is a complicated one. Especially since there can be multiple
policies for the same bucket with different configurations (iirc). I
think that the right way would be to add a level under the bucket sync
that would treat the different policies as different sync instances
(each policy would have its own markers). I'm not sure it is the most
optimal method to handle it.
At the least we should be able to have more than a single sync state
to allow dealing with multiple replication rules of the same pipe.

Yehuda

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux