Re: possible deadlock in send_sigio

Waiman Long <longman@xxxxxxxxxx> · Thu, 11 Jun 2020 09:51:29 -0400

On 6/11/20 3:43 AM, Dmitry Vyukov wrote:
On Thu, Jun 11, 2020 at 4:33 AM Waiman Long <longman@xxxxxxxxxx> wrote:
On 4/4/20 1:55 AM, syzbot wrote:
Hello,

syzbot found the following crash on:

HEAD commit:    bef7b2a7 Merge tag 'devicetree-for-5.7' of git://git.kerne..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15f39c5de00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=91b674b8f0368e69
dashboard link: https://syzkaller.appspot.com/bug?extid=a9fb1457d720a55d6dc5
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1454c3b7e00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12a22ac7e00000

The bug was bisected to:

commit 7bc3e6e55acf065500a24621f3b313e7e5998acf
Author: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
Date:   Thu Feb 20 00:22:26 2020 +0000

      proc: Use a list of inodes to flush from proc

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=165c4acde00000
final crash:    https://syzkaller.appspot.com/x/report.txt?x=155c4acde00000
console output: https://syzkaller.appspot.com/x/log.txt?x=115c4acde00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+a9fb1457d720a55d6dc5@xxxxxxxxxxxxxxxxxxxxxxxxx
Fixes: 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")

========================================================
WARNING: possible irq lock inversion dependency detected
5.6.0-syzkaller #0 Not tainted
--------------------------------------------------------
ksoftirqd/0/9 just changed the state of lock:
ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigio+0xa9/0x340 fs/fcntl.c:800
but this lock took another, SOFTIRQ-unsafe lock in the past:
   (&pid->wait_pidfd){+.+.}-{2:2}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
   Possible interrupt unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&pid->wait_pidfd);
                                 local_irq_disable();
                                 lock(tasklist_lock);
                                 lock(&pid->wait_pidfd);
    <Interrupt>
      lock(tasklist_lock);

   *** DEADLOCK ***
That is a false positive. The qrwlock has the special property that it
becomes unfair (for read lock) at interrupt context. So unless it is
taking a write lock in the interrupt context, it won't go into deadlock.
The current lockdep code does not capture the full semantics of qrwlock
leading to this false positive.
Hi Longman

Thanks for looking into this.
Now the question is: how should we change lockdep annotations to fix this bug?

There was an old lockdep patch that I think may address the issue, but 
was not merged at the time. I will need to dig it out and see if it can 
be adapted to work in the current kernel. It may take some time.

Cheers,
Longman