Re: snap_trimming + backfilling is inefficient with many purged_snaps

Florian Haas <florian@xxxxxxxxxxx> · Thu, 18 Sep 2014 19:03:59 +0200

Hi Dan,

saw the pull request, and can confirm your observations, at least
partially. Comments inline.

On Thu, Sep 18, 2014 at 2:50 PM, Dan Van Der Ster
<daniel.vanderster@xxxxxxx> wrote:
>>> Do I understand your issue report correctly in that you have found
>>> setting osd_snap_trim_sleep to be ineffective, because it's being
>>> applied when iterating from PG to PG, rather than from snap to snap?
>>> If so, then I'm guessing that that can hardly be intentional…
>
>
> I’m beginning to agree with you on that guess. AFAICT, the normal behavior of the snap trimmer is to trim one single snap, the one which is in the snap_trimq but not yet in purged_snaps. So the only time the current sleep implementation could be useful is if we rm’d a snap across many PGs at once, e.g. rm a pool snap or an rbd snap. But those aren’t a huge problem anyway since you’d at most need to trim O(100) PGs.

Hmm. I'm actually seeing this in a system where the problematic snaps
could *only* have been RBD snaps.

> We could move the snap trim sleep into the SnapTrimmer state machine, for example in ReplicatedPG::NotTrimming::react. This should allow other IOs to get through to the OSD, but of course the trimming PG would remain locked. And it would be locked for even longer now due to the sleep.
>
> To solve that we could limit the number of trims per instance of the SnapTrimmer, like I’ve done in this pull req: https://github.com/ceph/ceph/pull/2516
> Breaking out of the trimmer like that should allow IOs to the trimming PG to get through.
>
> The second aspect of this issue is why are the purged_snaps being lost to begin with. I’ve managed to reproduce that on my test cluster. All you have to do is create many pool snaps (e.g. of a nearly empty pool), then rmsnap all those snapshots. Then use crush reweight to move the PGs around. With debug_osd>=10, you will see "adding snap 1 to purged_snaps”, which is one signature of this lost purged_snaps issue. To reproduce slow requests the number of snaps purged needs to be O(10000).

Hmmm, I'm not sure if I confirm that. I see "adding snap X to
purged_snaps", but only after the snap has been purged. See
https://gist.github.com/fghaas/88db3cd548983a92aa35. Of course, the
fact that the OSD tries to trim a snap only to get an ENOENT is
probably indicative of something being fishy with the snaptrimq and/or
the purged_snaps list as well.

> Looking forward to any ideas someone might have.

So am I. :)

Cheers,
Florian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html