On 04.04.2018 19:25, Waiman Long wrote: > On 04/04/2018 11:55 AM, Kirill Tkhai wrote: >> On 04.04.2018 18:51, Kirill Tkhai wrote: >>> On 04.04.2018 18:35, Peter Zijlstra wrote: >>>> On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote: >>>>> The following situation leads to deadlock: >>>>> >>>>> [task 1] [task 2] [task 3] >>>>> kill_fasync() mm_update_next_owner() copy_process() >>>>> spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) >>>>> send_sigio() <IRQ> ... >>>>> read_lock(&fown->lock) kill_fasync() ... >>>>> read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ... >>>>> >>>>> Task 1 can't acquire read locked tasklist_lock, since there is >>>>> already task 3 expressed its wish to take the lock exclusive. >>>>> Task 2 holds the read locked lock, but it can't take the spin lock. >>>>> >>>>> The patch makes queued_read_lock_slowpath() to give task 1 the same >>>>> priority as it was an interrupt handler, and to take the lock >>>> That re-introduces starvation scenarios. And the above looks like a >>>> proper deadlock that should be sorted by fixing the locking order. >>> We can move tasklist_lock out of send_sigio(), but I'm not sure >>> it's possible for read_lock(&fown->lock). >>> >>> Is there another solution? Is there reliable way to iterate do_each_pid_task() >>> with rcu_read_lock()? >> In case of &fown->lock we may always disable irqs for all the places, where it's >> taken for read, i.e. read_lock_irqsave(&fown->lock). This seems to fix the problem >> for this lock. > > One possible solution is add a flag in send_sigio() to use a > read_trylock(&tasklist_lock) instead of read_lock(). If the trylock > fails, returns an error and have the caller (kill_fasync) release > fa->fa_lock and retry again. Task 1 has 3 levels of nested locking and > so it should be the one that does a retry if the innermost locking > fails. An warning can be printed if the retry count is too large. send_sigio() is called from several places, and the context, which calls it, in general is unknown. In case of dnotify_handle_event(), which calls it under spinlock, trylock will act as busy loop. This may block some smp/stop machine primitives, and I'm not sure, this can't bring to some currently unvisible deadlocks. There is a solution, I'm analysing in the moment, when we can convert fasync_struct::fa_lock to read/write lock, then we can take it from interrupt without problems. Kirill