Hammer or jewel? I've forgotten which thread pool is handling the snap trim nowadays -- is it the op thread yet? If so, perhaps all the op threads are stuck sleeping? Just a wild guess. (Maybe increasing # op threads would help?). -- Dan On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <nick@xxxxxxxxxx> wrote: > Hi, > > I had been testing some higher values with the osd_snap_trim_sleep variable to try and reduce the impact of removing RBD snapshots > on our cluster and I have come across what I believe to be a possible unintended consequence. The value of the sleep seems to keep > the lock on the PG open so that no other IO can use the PG whilst the snap removal operation is sleeping. > > I had set the variable to 10s to completely minimise the impact as I had some multi TB snapshots to remove and noticed that suddenly > all IO to the cluster had a latency of roughly 10s as well, all the dumped ops show waiting on PG for 10s as well. > > Is the osd_snap_trim_sleep variable only ever meant to be used up to say a max of 0.1s and this is a known side effect, or should > the lock on the PG be removed so that normal IO can continue during the sleeps? > > Nick > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com