Re: snap_trimming + backfilling is inefficient with many purged_snaps

Florian Haas <florian@xxxxxxxxxxx> · Mon, 22 Sep 2014 19:06:38 +0200

On Sun, Sep 21, 2014 at 9:52 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Sun, 21 Sep 2014, Florian Haas wrote:
>> So yes, I think your patch absolutely still has merit, as would any
>> means of reducing the number of snapshots an OSD will trim in one go.
>> As it is, the situation looks really really bad, specifically
>> considering that RBD and RADOS are meant to be super rock solid, as
>> opposed to say CephFS which is in an experimental state. And contrary
>> to CephFS snapshots, I can't recall any documentation saying that RBD
>> snapshots will break your system.
>
> Yeah, it sounds like a separate issue, and no, the limit is not
> documented because it's definitely not the intended behavior. :)
>
> ...and I see you already have a log attached to #9503.  Will take a look.

I've already updated that issue in Redmine, but for the list archives
I should also add this here: Dan's patch for #9503, together with
Sage's for #9487, makes the problem go away in an instant. I've
already pointed out that I owe Dan dinner, and Sage, well I already
owe Sage pretty much lifelong full board. :)

Everyone with a ton of snapshots in their clusters (not sure where the
threshold is, but it gets nasty somewhere between 1,000 and 10,000 I
imagine) should probably update to 0.67.11 and 0.80.6 as soon as they
come out, otherwise Terrible Things Will Happen™ if you're ever forced
to delete a large number of snaps at once.

Thanks again to Dan and Sage,
Florian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html