Re: [syzbot] WARNING in handle_mm_fault

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Fri, 12 Mar 2021 07:36:27 +0100

On Fri, Mar 12, 2021 at 3:30 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> Your warning is odd, but I see the bug.  It's in KVM.

Hi Andy,

By "your" you mean "kernel", right? ;)

> On Thu, Mar 11, 2021 at 4:37 PM syzbot
> <syzbot+7d7013084f0a806f3786@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    05a59d79 Merge git://git.kernel.org:/pub/scm/linux/kernel/..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16f493ead00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=750735fdbc630971
> > dashboard link: https://syzkaller.appspot.com/bug?extid=7d7013084f0a806f3786
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+7d7013084f0a806f3786@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > ------------[ cut here ]------------
> > raw_local_irq_restore() called with IRQs enabled
> > WARNING: CPU: 0 PID: 8412 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:10
> > Modules linked in:
> > CPU: 0 PID: 8412 Comm: syz-fuzzer Not tainted 5.12.0-rc2-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > RIP: 0010:warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:10
>
> The above makes sense, but WTH is the below:
>
> > Code: be ff cc cc cc cc cc cc cc cc cc cc cc 80 3d 11 d1 ad 04 00 74 01 c3 48 c7 c7 20 79 6b 89 c6 05 00 d1 ad 04 01 e8 75 5b be ff <0f> 0b c3 48 39 77 10 0f 84 97 00 00 00 66 f7 47 22 f0 ff 74 4b 48
> > RSP: 0000:ffffc9000185fac8 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: ffff8880194268a0 RCX: 0000000000000000
> > RBP: 0000000000000200 R08: 0000000000000000 R09: 0000000000000000
> > R13: ffffed1003284d14 R14: 0000000000000001 R15: ffff8880b9c36000
> > FS:  000000c00002ec90(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> > Call Trace:
> >  handle_mm_fault+0x1bc/0x7e0 mm/memory.c:4549
> > Code: 48 8d 05 97 25 3e 00 48 89 44 24 08 e8 6d 54 ea ff 90 e8 07 a1 ed ff eb a5 cc cc cc cc cc 8b 44 24 10 48 8b 4c 24 08 89 41 24 <c3> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b
> > RAX: 00000000000047f6 RBX: 00000000000047f6 RCX: 0000000000d60000
> > RDX: 0000000000004c00 RSI: 0000000000d60000 RDI: 000000000181cad0
> > RBP: 000000c000301890 R08: 00000000000047f5 R09: 000000000059c5a0
> > R10: 000000c0004e2000 R11: 0000000000000020 R12: 00000000000000fa
> > R13: 00aaaaaaaaaaaaaa R14: 000000000093f064 R15: 0000000000000038
> > Kernel panic - not syncing: panic_on_warn set ...
> > CPU: 0 PID: 8412 Comm: syz-fuzzer Not tainted 5.12.0-rc2-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
>
> Now we start reading here:
>
> >  __dump_stack lib/dump_stack.c:79 [inline]
> >  dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> >  panic+0x306/0x73d kernel/panic.c:231
> >  __warn.cold+0x35/0x44 kernel/panic.c:605
> >  report_bug+0x1bd/0x210 lib/bug.c:195
> >  handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
> >  exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
> >  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:575
> > RIP: 0010:warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:10
> > Code: be ff cc cc cc cc cc cc cc cc cc cc cc 80 3d 11 d1 ad 04 00 74 01 c3 48 c7 c7 20 79 6b 89 c6 05 00 d1 ad 04 01 e8 75 5b be ff <0f> 0b c3 48 39 77 10 0f 84 97 00 00 00 66 f7 47 22 f0 ff 74 4b 48
> > RSP: 0000:ffffc9000185fac8 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: ffff8880194268a0 RCX: 0000000000000000
> > RDX: ffff88802f7b2400 RSI: ffffffff815b4435 RDI: fffff5200030bf4b
> > RBP: 0000000000000200 R08: 0000000000000000 R09: 0000000000000000
> > R10: ffffffff815ad19e R11: 0000000000000000 R12: 0000000000000003
> > R13: ffffed1003284d14 R14: 0000000000000001 R15: ffff8880b9c36000
> >  kvm_wait arch/x86/kernel/kvm.c:860 [inline]
>
> and there's the bug:
>
>         /*
>          * halt until it's our turn and kicked. Note that we do safe halt
>          * for irq enabled case to avoid hang when lock info is overwritten
>          * in irq spinlock slowpath and no spurious interrupt occur to save us.
>          */
>         if (arch_irqs_disabled_flags(flags))
>                 halt();
>         else
>                 safe_halt();
>
> out:
>         local_irq_restore(flags);
> }
>
> The safe_halt path is bogus.  It should just return instead of
> restoring the IRQ flags.

I think this should be fixed already by:
https://patchwork.kernel.org/project/kvm/patch/1614057902-23774-1-git-send-email-wanpengli@xxxxxxxxxxx/

#syz fix: x86/kvm: Fix broken irq restoration in kvm_wait