Re: bi-directional cloud sync

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Tue, 20 Feb 2018 14:49:50 -0800

On Tue, Feb 20, 2018 at 2:32 PM, Kyle Bader <kyle.bader@xxxxxxxxx> wrote:
> The current unidirectional sync is:
>
> many buckets -> single bucket

Not necessarily, depending on how it is configured.

>
> What does that look like if we throw it in reverse?

Depending how the configuration is going to look like. Generally there
are a few params that deal with mutating the destination bucket/object
name, and I think we can do some kind of reverse mapping too.
Currently with the cloud sync you can specify different mappings for
different buckets (or bucket prefixes). The mapping is defined by a
configurable field that uses multiple variables to generate the remote
bucket/object pair. These variables include the sync instance id,
source zonegroup, source zone, source bucket name, and the bucket
owner. The default destination name configuration is currently:
"rgw-${zonegroup}-${sid}/${bucket}", but it can be set to anything
(that is valid). A bucket named foo in zonegroup named zg with object
named bar with sync instance id 100 will end up as: rgw-zg-100/foo/bar
(that is bucket named rgw-zg-100, and object named foo/bar). But you
could define a different mapping for any bucket, potentially have a
1:1 mapping between source buckets and destination buckets. One major
problem is that in general you're very limited in the number of
buckets you can create at the destination.

Yehuda

>
> The many to single relationship limits the utility in many ways to
> what is essentially a backup to the public cloud, because only what is
> essentially an infrastructure admin level account for disaster
> scenarios should be able to access *all* backed up buckets.
>
>
>
> On Mon, Feb 19, 2018 at 2:17 PM, Yehuda Sadeh-Weinraub
> <yehuda@xxxxxxxxxx> wrote:
>> On Sat, Feb 17, 2018 at 12:52 AM, Orit Wasserman <owasserm@xxxxxxxxxx> wrote:
>>> On Sat, Feb 17, 2018 at 3:16 AM, Yehuda Sadeh-Weinraub
>>> <yehuda@xxxxxxxxxx> wrote:
>>>> Now that the sync-to-the-cloud work is almost complete, I was thinking
>>>> a bit and did some research about bi-directional sync. The big
>>>> difficulty I had with the syncing from the cloud process is the need
>>>> to rework the whole data sync paths where we identify changes. These
>>>> are quite complicated, and these kind of changes are quite a big
>>>> project. I'm not quite sure now that this is needed.
>>>> What I think we could do in a relatively easy (and less risky) way is
>>>> that instead of embedding a new mechanism within the sync logic, we
>>>> can create a module that turns upstream cloud changes into the
>>>> existing rgw logs: that is, data log, and bucket index logs (no
>>>> metadata log needed). In this way we break the problem into two
>>>> separate issues, where one of the issues is already solved. The
>>>> ingesting rgw could then do the same work it is doing with regular
>>>> zones (fetching these logs, and pulling the data from a remote
>>>> endpoint) -- albeit with various slight changes that are required
>>>> since we can't have some of the special apis that we created to assist
>>>> us.
>>>
>>> Sounds like a good plan, it may increase the time we detect changes.
>>> If we can give the user an estimation I think it will be acceptable.
>>>
>>>> We'll need to see how these could be replaced and what will be the
>>>> trade offs, but we'll need to do that anyway with any solution.
>>>> The changes discovery module that will turn remote cloud changes into
>>>> local logs could do it by either polling the remote endpoints, or (for
>>>> S3 for example) could use buckets notifications mechanism. It will
>>>> build its local changes logs by setting new entries on them according
>>>> to the changes it identifies. The radosgw zone that will sync from the
>>>> cloud will have two endpoints one that will be used to fetch the the
>>>> logs, and another one that will be used to sync in the data.
>>>> I'm a bit simplifying it, there are a few more issues there, but
>>>> that's the gist of it.
>>>>
>>>> Any thoughts?
>>>
>>> can this used for syncing indexless buckets?
>>>
>>
>> Potentially, but it would depend on the ability to identify changes
>> there in a scalable way. I'm not sure this is the panacea for this
>> problem.
>>
>> Yehuda
>>
>>> Regards,
>>> Orit
>>>
>>>>
>>>> Yehuda
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html