Re: Snap trim queue length issues

Piotr Dałek <piotr.dalek@xxxxxxxxxxxx> · Fri, 15 Dec 2017 10:00:56 +0100

On 17-12-14 05:31 PM, David Turner wrote:
I've tracked this in a much more manual way.  I would grab a random subset 
[..]

This was all on a Hammer cluster.  The changes to the snap trimming queues 
going into the main osd thread made it so that our use case was not viable 
on Jewel until changes to Jewel that happened after I left.  It's exciting 
that this will actually be a reportable value from the cluster.

Sorry that this story doesn't really answer your question, except to say 
that people aware of this problem likely have a work around for it.  However 
I'm certain that a lot more clusters are impacted by this than are aware of 
it and being able to quickly see that would be beneficial to troubleshooting 
problems.  Backporting would be nice.  I run a few Jewel clusters that have 
some VM's and it would be nice to see how well the cluster handle snap 
trimming.  But they are much less critical on how much snapshots they do.

Thanks for your response, it pretty much confirms what I though:
- users aware of issue have their own hacks that don't need to be efficient 
or convenient.
- users unaware of issue are, well, unaware and at risk of serious service 
disruption once disk space is all used up.

Hopefully it'll be convincing enough for devs. ;)

--
Piotr Dałek
piotr.dalek@xxxxxxxxxxxx
https://www.ovh.com/us/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com