Re: removed_snaps

Sage Weil <sweil@xxxxxxxxxx> · Mon, 16 Oct 2017 18:38:36 +0000 (UTC)

On Mon, 16 Oct 2017, Gregory Farnum wrote:
> On Sat, Oct 14, 2017 at 8:56 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> > On Fri, 13 Oct 2017, Gregory Farnum wrote:
> >> On Fri, Oct 13, 2017 at 2:32 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> >> > On Fri, 13 Oct 2017, Gregory Farnum wrote:
> >> >> Moving on to the stuff actually written down:
> >> >> How comfortable are we with the size of the currently-deleting
> >> >> snapshots maps, for computation purposes? I don't have a good way of
> >> >> quantifying that cost but I'm definitely tempted to split the sets
> >> >> into
> >> >> newly_deleted_snaps (for *this epoch*)
> >> >> deleting_snaps (which are kept around until removed_snaps_lb_epoch)
> >> >> newly_purged_snaps (also for this epoch, which I think is how you have
> >> >> it written?)
> >> >
> >> > The code I have now has 2 sets: removed_snaps and purged_snaps.
> >> > purged_snapd was already there before; removed_snaps is the superset that
> >> > includes stuff not yet purged.  (I made it a superset thinking that
> >> > the filter_snapc() thing on every op will be faster if it only has to
> >> > filter against a single set instead of 2 of them, and that matters more
> >> > than the memory.)
> >>
> >> No, what I mean is:
> >> * we know that computing the overlap of interval sets can be expensive
> >> * so we want to only compute overlaps of small sets
> >> * this is especially true if we're doing it on all outstanding
> >> operations in librados, within a single thread
> >>
> >> So we definitely don't want to repeat comparisons if we don't have to.
> >> And I suspect there will be occasional periods of intense snapshot
> >> deletion from admins where they remove a number large enough to cause
> >> trouble, and we don't want that to result in noticeably slower IO
> >> submission on the client-side!
> >>
> >> So we should have a separate set of snaps which were just added to the
> >> removed_snaps during this epoch, that librados and the OSD map
> >> processing can use in their normal-course-of-business scanning.
> >
> > I definitely get that we want to avoid any set intersections (and I think
> > the current code actually has only 1 of them, during peering activate).
> > But I'm not quite following which lookup you're thinking of.  FTR
> > filter_snapc iterates over the vector<snapid_t> in SnapContext and does an
> > interval_set lookup on each one to make sure it isn't deleted.
> >
> > I think there are basically two cases:
> >
> > 1- The request comes in with the same epoch as our epoch, and the mimic
> > feature, so we could skip the filter_snapc entirely (client will have done
> > it).  Yay!  Hopefully this will be true for most requests in steady state.
> >
> > 2- The request comes in with an older osdmap epoch, so we have to
> > filter_snapc against the recent_removed_snaps interval_set<>.  This is
> > what we do today, except instead of recent_removed_snaps we have
> > removed_snaps_over_all_time, so we are going from a huge set to a
> > relatively small one.
> 
> You're leaving out
> 
> 3) Every client runs a set intersection on its outstanding ops on
> every map update (which we did not run before).
> 
> This is the one I'm thinking about when I say we should have a
> this-epoch-only set of additions.

Oh, I see what you mean.

Each OSDMap::Incremental has the new_removed_snaps, so it will be small 
list to filter against (esp compared to the CRUSH calculation).  I'm also 
going to switch it a flat_set<>, which gives log(n) lookup w/ 1 
allocation.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html