On Thu, 2018-02-08 at 13:27 -0600, Benjamin Marzinski wrote: > On Thu, Feb 08, 2018 at 06:19:32PM +0000, Bart Van Assche wrote: > > On Wed, 2018-02-07 at 16:49 -0600, Benjamin Marzinski wrote: > > > Commit 6e2423fd fixed a bug where the tur checker could cancel a > > > detached thread after it had exitted. However in fixing this, the new > > > code grabbed a mutex (to call condlog) while holding a spin_lock. To > > > deal with that, and to try to keep with the maixim "lock data, not > > > code", I've changed how the tur checker synchronizes with its thread. > > > > > > Now, the tur checker creates joinable threads, and detaches them when > > > the thread is finished or has timed out. To track the state of the > > > threads, I've added a new variable to the checker context, ct->attached. > > > When a thread starts, attached is set to 1. When the thread finishes, it > > > saves the value of attached, and then zeros it out, while locked. If > > > attached was set, it detaches itself. > > > > > > When the tur checker gives up on a thread, it also saves and decrements > > > ct->attached, while locked. At the same time it saves the value of > > > ct->thread. If attached was set, it cancels the thread, and then > > > detaches it. > > > > > > So the values that are protected by the spin lock are now ct->holders, > > > ct->thread, and ct->attached. There are cases where the tur checker just > > > wants to know if the thread is running. If so, its safe to simply read > > > ct->thread without locking. Also, if it knows that the thread isn't > > > running, it can freely access all of these variables. I've left the > > > locking in-place in these cases to make the static analyzers happy. > > > However I have added comments stating when the locking isn't actually > > > necessary. > > > > Hello Ben, > > > > Have you considered to move the condlog() statements out of the spinlock > > section? I think that would lead to a much smaller and less contrived change > > than the patch you posted. > > Yes. I thought it was better to keep the amount of code locked under a > spin_lock as small as possible. But it seems that I am alone in thinking > my trade-off was worth it. If both you and Martin object, I'm fine with > redoing the patch to simply move the condlog(), since I don't believe > that either tur_check_async_timeout() or pthread_cancel() can block. Hello Ben, Have you considered to convert the hldr_lock spinlock into a mutex? I think that the hldr_lock spinlock got introduced by commit 892f7b333a03 ("multipath: fix scsi async tur checker corruption"). However, I have not found an explanation in the description of that commit why hldr_lock is a spinlock and not any other type of synchronization primitive. Thanks, Bart. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel