On Fri, Jun 28, 2019 at 7:50 AM Sage Weil <sweil@xxxxxxxxxx> wrote: > > Hi Myoungwon, > > I was thinking about how a refcounted cas pool would interact with > snapshots and it occurred to me that dropping refs when an object is > deleted may break snapshotted versions of that object. If object A has > a ref to chunk X, is snapshotted, then A is deleted, we'll (currently) > drop the ref to X and remove it. That means that A can't be read. > > One way to get around that would be to mirror snaps from the source pool > to the chunk pool--this is how cache tiering works. The problem I see > there is that I'd hoped to allow multiple pools to share/consume the same > chunk pool, but each pool has its own snapid namespace. > > Another would be to bake the refs more deepling into the source rados pool > so that the refs are only dropped after all clones also drop the ref. > That is harder to track, though, since I think you'd need to examine all > of the clones to know whether the ref is truly gone. Unless we embed > even more metadata in the SnapSet--something analogous to clone_overlap to > identifying the chunks. That seems like it will bloat that structure, > though. > > Other ideas? Is there much design work around refcounting and snapshots yet? I haven't thought it through much but one possibility is that each on-disk clone counts as its own reference, and on a write to the manifest object you increment the reference to all the chunks in common. When snaptrimming finally removes a clone, it has to decrement all the chunk references contained in the manifest. I don't love this for the extra trimming work and remote reference updates, but it's one way to keep the complexity of the data structures down. Other options: * Force 1:1 mapping. Not sure how good or bad this is since I haven't seen a lot of CAS pool discussion. * no longer giving each pool its own snapshot namespace. Not sure this was a great design decision to begin with; would require updating CephFS snap allocation but I don't think anything else outside the monitors. * Disallowing snapshots on manifest-based objects/pools. What are the target workloads for these? -Greg > > sage > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx