Re: osd_snap_trim_sleep keeps locks PG during sleep?

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Fri, 13 Jan 2017 11:27:35 +0100

Hammer or jewel? I've forgotten which thread pool is handling the snap
trim nowadays -- is it the op thread yet? If so, perhaps all the op
threads are stuck sleeping? Just a wild guess. (Maybe increasing # op
threads would help?).

-- Dan

On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> Hi,
>
> I had been testing some higher values with the osd_snap_trim_sleep variable to try and reduce the impact of removing RBD snapshots
> on our cluster and I have come across what I believe to be a possible unintended consequence. The value of the sleep seems to keep
> the lock on the PG open so that no other IO can use the PG whilst the snap removal operation is sleeping.
>
> I had set the variable to 10s to completely minimise the impact as I had some multi TB snapshots to remove and noticed that suddenly
> all IO to the cluster had a latency of roughly 10s as well, all the dumped ops show waiting on PG for 10s as well.
>
> Is the osd_snap_trim_sleep variable only ever meant to be used up to say a max of 0.1s and this is a known side effect, or should
> the lock on the PG be removed so that normal IO can continue during the sleeps?
>
> Nick
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com