Re: bi-directional cloud sync

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Mon, 19 Feb 2018 14:14:10 -0800

On Fri, Feb 16, 2018 at 5:21 PM, Matt Benjamin <mbenjami@xxxxxxxxxx> wrote:
> Hi,
>
> On Fri, Feb 16, 2018 at 8:16 PM, Yehuda Sadeh-Weinraub
> <yehuda@xxxxxxxxxx> wrote:
>> Now that the sync-to-the-cloud work is almost complete, I was thinking
>> a bit and did some research about bi-directional sync. The big
>> difficulty I had with the syncing from the cloud process is the need
>> to rework the whole data sync paths where we identify changes. These
>> are quite complicated, and these kind of changes are quite a big
>> project. I'm not quite sure now that this is needed.
>> What I think we could do in a relatively easy (and less risky) way is
>> that instead of embedding a new mechanism within the sync logic, we
>> can create a module that turns upstream cloud changes into the
>> existing rgw logs: that is, data log, and bucket index logs (no
>> metadata log needed).
>
> This is clean, but naively seems to induce a lot of transient OMAP i/o?

Less than what we have on a regular rgw zone. One thing to think about
is how to deal with resharding on these buckets.

Yehuda

>
>  In this way we break the problem into two
>> separate issues, where one of the issues is already solved. The
>> ingesting rgw could then do the same work it is doing with regular
>> zones (fetching these logs, and pulling the data from a remote
>> endpoint) -- albeit with various slight changes that are required
>> since we can't have some of the special apis that we created to assist
>> us.
>> We'll need to see how these could be replaced and what will be the
>> trade offs, but we'll need to do that anyway with any solution.
>> The changes discovery module that will turn remote cloud changes into
>> local logs could do it by either polling the remote endpoints, or (for
>> S3 for example) could use buckets notifications mechanism.
>
> If AWS makes this efficient, sounds reasonable.
>
>  It will
>> build its local changes logs by setting new entries on them according
>> to the changes it identifies. The radosgw zone that will sync from the
>> cloud will have two endpoints one that will be used to fetch the the
>> logs, and another one that will be used to sync in the data.
>> I'm a bit simplifying it, there are a few more issues there, but
>> that's the gist of it.
>>
>> Any thoughts?
>>
>> Yehuda
>
> Matt
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html