The current unidirectional sync is: many buckets -> single bucket What does that look like if we throw it in reverse? The many to single relationship limits the utility in many ways to what is essentially a backup to the public cloud, because only what is essentially an infrastructure admin level account for disaster scenarios should be able to access *all* backed up buckets. On Mon, Feb 19, 2018 at 2:17 PM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote: > On Sat, Feb 17, 2018 at 12:52 AM, Orit Wasserman <owasserm@xxxxxxxxxx> wrote: >> On Sat, Feb 17, 2018 at 3:16 AM, Yehuda Sadeh-Weinraub >> <yehuda@xxxxxxxxxx> wrote: >>> Now that the sync-to-the-cloud work is almost complete, I was thinking >>> a bit and did some research about bi-directional sync. The big >>> difficulty I had with the syncing from the cloud process is the need >>> to rework the whole data sync paths where we identify changes. These >>> are quite complicated, and these kind of changes are quite a big >>> project. I'm not quite sure now that this is needed. >>> What I think we could do in a relatively easy (and less risky) way is >>> that instead of embedding a new mechanism within the sync logic, we >>> can create a module that turns upstream cloud changes into the >>> existing rgw logs: that is, data log, and bucket index logs (no >>> metadata log needed). In this way we break the problem into two >>> separate issues, where one of the issues is already solved. The >>> ingesting rgw could then do the same work it is doing with regular >>> zones (fetching these logs, and pulling the data from a remote >>> endpoint) -- albeit with various slight changes that are required >>> since we can't have some of the special apis that we created to assist >>> us. >> >> Sounds like a good plan, it may increase the time we detect changes. >> If we can give the user an estimation I think it will be acceptable. >> >>> We'll need to see how these could be replaced and what will be the >>> trade offs, but we'll need to do that anyway with any solution. >>> The changes discovery module that will turn remote cloud changes into >>> local logs could do it by either polling the remote endpoints, or (for >>> S3 for example) could use buckets notifications mechanism. It will >>> build its local changes logs by setting new entries on them according >>> to the changes it identifies. The radosgw zone that will sync from the >>> cloud will have two endpoints one that will be used to fetch the the >>> logs, and another one that will be used to sync in the data. >>> I'm a bit simplifying it, there are a few more issues there, but >>> that's the gist of it. >>> >>> Any thoughts? >> >> can this used for syncing indexless buckets? >> > > Potentially, but it would depend on the ability to identify changes > there in a scalable way. I'm not sure this is the panacea for this > problem. > > Yehuda > >> Regards, >> Orit >> >>> >>> Yehuda > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html