Re: bi-directional cloud sync

Kyle Bader <kyle.bader@xxxxxxxxx> · Tue, 20 Feb 2018 14:32:56 -0800

The current unidirectional sync is:

many buckets -> single bucket

What does that look like if we throw it in reverse?

The many to single relationship limits the utility in many ways to
what is essentially a backup to the public cloud, because only what is
essentially an infrastructure admin level account for disaster
scenarios should be able to access *all* backed up buckets.

On Mon, Feb 19, 2018 at 2:17 PM, Yehuda Sadeh-Weinraub
<yehuda@xxxxxxxxxx> wrote:
> On Sat, Feb 17, 2018 at 12:52 AM, Orit Wasserman <owasserm@xxxxxxxxxx> wrote:
>> On Sat, Feb 17, 2018 at 3:16 AM, Yehuda Sadeh-Weinraub
>> <yehuda@xxxxxxxxxx> wrote:
>>> Now that the sync-to-the-cloud work is almost complete, I was thinking
>>> a bit and did some research about bi-directional sync. The big
>>> difficulty I had with the syncing from the cloud process is the need
>>> to rework the whole data sync paths where we identify changes. These
>>> are quite complicated, and these kind of changes are quite a big
>>> project. I'm not quite sure now that this is needed.
>>> What I think we could do in a relatively easy (and less risky) way is
>>> that instead of embedding a new mechanism within the sync logic, we
>>> can create a module that turns upstream cloud changes into the
>>> existing rgw logs: that is, data log, and bucket index logs (no
>>> metadata log needed). In this way we break the problem into two
>>> separate issues, where one of the issues is already solved. The
>>> ingesting rgw could then do the same work it is doing with regular
>>> zones (fetching these logs, and pulling the data from a remote
>>> endpoint) -- albeit with various slight changes that are required
>>> since we can't have some of the special apis that we created to assist
>>> us.
>>
>> Sounds like a good plan, it may increase the time we detect changes.
>> If we can give the user an estimation I think it will be acceptable.
>>
>>> We'll need to see how these could be replaced and what will be the
>>> trade offs, but we'll need to do that anyway with any solution.
>>> The changes discovery module that will turn remote cloud changes into
>>> local logs could do it by either polling the remote endpoints, or (for
>>> S3 for example) could use buckets notifications mechanism. It will
>>> build its local changes logs by setting new entries on them according
>>> to the changes it identifies. The radosgw zone that will sync from the
>>> cloud will have two endpoints one that will be used to fetch the the
>>> logs, and another one that will be used to sync in the data.
>>> I'm a bit simplifying it, there are a few more issues there, but
>>> that's the gist of it.
>>>
>>> Any thoughts?
>>
>> can this used for syncing indexless buckets?
>>
>
> Potentially, but it would depend on the ability to identify changes
> there in a scalable way. I'm not sure this is the panacea for this
> problem.
>
> Yehuda
>
>> Regards,
>> Orit
>>
>>>
>>> Yehuda
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html