Re: refcounting chunks vs snapshots

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 28 Jun 2019 15:11:28 -0700

On Fri, Jun 28, 2019 at 7:50 AM Sage Weil <sweil@xxxxxxxxxx> wrote:
>
> Hi Myoungwon,
>
> I was thinking about how a refcounted cas pool would interact with
> snapshots and it occurred to me that dropping refs when an object is
> deleted may break snapshotted versions of that object.  If object A has
> a ref to chunk X, is snapshotted, then A is deleted, we'll (currently)
> drop the ref to X and remove it.  That means that A can't be read.
>
> One way to get around that would be to mirror snaps from the source pool
> to the chunk pool--this is how cache tiering works.  The problem I see
> there is that I'd hoped to allow multiple pools to share/consume the same
> chunk pool, but each pool has its own snapid namespace.
>
> Another would be to bake the refs more deepling into the source rados pool
> so that the refs are only dropped after all clones also drop the ref.
> That is harder to track, though, since I think you'd need to examine all
> of the clones to know whether the ref is truly gone.  Unless we embed
> even more metadata in the SnapSet--something analogous to clone_overlap to
> identifying the chunks.  That seems like it will bloat that structure,
> though.
>
> Other ideas?

Is there much design work around refcounting and snapshots yet?

I haven't thought it through much but one possibility is that each
on-disk clone counts as its own reference, and on a write to the
manifest object you increment the reference to all the chunks in
common. When snaptrimming finally removes a clone, it has to decrement
all the chunk references contained in the manifest.

I don't love this for the extra trimming work and remote reference
updates, but it's one way to keep the complexity of the data
structures down.

Other options:
* Force 1:1 mapping. Not sure how good or bad this is since I haven't
seen a lot of CAS pool discussion.
* no longer giving each pool its own snapshot namespace. Not sure this
was a great design decision to begin with; would require updating
CephFS snap allocation but I don't think anything else outside the
monitors.
* Disallowing snapshots on manifest-based objects/pools. What are the
target workloads for these?
-Greg

>
> sage
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx