KVM kernel BUG: unable to handle kernel paging request in kvm_kick_cpu

Lukas Bulwahn <lukas.bulwahn@xxxxxxxxx> · Thu, 24 Dec 2020 16:15:08 +0100

Dear kvm folks, (CC: syzkaller users),

I have encountered the following kernel bug running qemu on my system
(as part of setting up a syzkaller campaign).

kernel version: next-20201123 [can also be triggered on earlier
version, such v5.10]
kernel configuration:

wget https://raw.githubusercontent.com/openSUSE/kernel-source/9753528d001e0ff814f8b6938211bc47a0a2a1d8/config/x86_64/rt
-O .config
make olddefconfig

With the following simple script, I start up the kernel on a simple
debian base image (as used for syzkaller fuzzing) and just observe if
I can run through the systemd initialisation:

>>>
#!/bin/bash

crashes=0
for (( total=1; total<=100; total++ ))
do

timeout --foreground 10 \
  qemu-system-x86_64 -m 2048 -display none -serial stdio -no-reboot
-enable-kvm \
  -cpu host,migratable=off -drive file=$IMAGE/stretch.img,if=virtio \
  -device e1000,netdev=net0 \
  -netdev "user,id=net0,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22" \
  -snapshot -kernel $KERNEL/arch/x86/boot/bzImage \
  -append "earlyprintk=serial panic=-1 console=ttyS0 root=/dev/vda rw
kaslr crashkernel=512M minnowboard_1:eth0:::" -smp 4

RET=$?

if [ "$RET" = "0" ]; then
# echo "crashed!"
crashes=$((crashes+1))
elif [ "$RET" = "124" ]; then
# echo "timed out!"
:
else
# echo "unexpected return"
exit 1
fi

echo "crashes / total = $crashes / $total"

sleep 1

done
<<<

The image at $IMAGE/stretch.img was created as described at
https://github.com/google/syzkaller/blob/master/docs/linux/setup_ubuntu-host_qemu-vm_x86-64-kernel.md.

Side remark: Even if no kernel crash occurs, it fails to reach the
login, but that is unrelated to the reported kernel issue; I just
reduced the kernel config during the investigation to not have
sufficient driver support to actually start up all services
successfully.

In roughly 70 of 100 cases, I hit the following kernel crash:

[    0.474842] BUG: unable to handle page fault for address: ffffffff96044553
[    0.474844] #PF: supervisor write access in kernel mode
[    0.474844] #PF: error_code(0x0003) - permissions violation
[    0.474845] PGD 740c067 P4D 740c067 PUD 740d063 PMD 64001e1
[    0.474847] Oops: 0003 [#1] SMP PTI
[    0.474847] CPU: 1 PID: 34 Comm: kauditd Not tainted
5.10.0-next-20201223-rt #2
[    0.474848] RIP: 0010:kvm_kick_cpu+0x23/0x30
[    0.474848] Code: 1f 84 00 00 00 00 00 66 66 66 66 90 48 63 ff 53
48 c7 c0 78 15 01 00 31 db 48 8b 14 fd 60 84 db 96 0f b7 0c 02 b8 05
00 00 00 <0f> 01 c1 5b c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48
89 fb
[    0.474849] RSP: 0000:ffffb1428013fbe8 EFLAGS: 00010046
[    0.474850] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000000
[    0.474850] RDX: ffff8ff0bfc00000 RSI: ffffffff97699600 RDI: 0000000000000000
[    0.474851] RBP: 0000000000000000 R08: 0000000000000100 R09: ffff8ff0bffd5000
[    0.474851] R10: 303034313d657079 R11: 74203a7469647561 R12: 0000000000000001
[    0.474851] R13: 0000000000000005 R14: 0000000000000000 R15: 00000000000000ed
[    0.474852] FS:  0000000000000000(0000) GS:ffff8ff0bfc80000(0000)
knlGS:0000000000000000
[    0.474852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.474853] CR2: ffffffff96044553 CR3: 0000000000c64000 CR4: 00000000003506a0
[    0.474853] Call Trace:
[    0.474853]  __pv_queued_spin_unlock_slowpath+0xa1/0xd0
[    0.474854]  __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
[    0.474854]  ? univ8250_console_exit+0x20/0x20
[    0.474855]  .slowpath+0x9/0xe
[    0.474855]  _raw_spin_unlock_irqrestore+0xa/0x10
[    0.474855]  serial8250_console_write+0x30c/0x330
[    0.474856]  console_unlock+0x37b/0x4b0
[    0.474856]  vprintk_emit+0xb0/0x170
[    0.474856]  ? audit_log_lost+0x90/0x90
[    0.474857]  printk+0x58/0x6f
[    0.474857]  kauditd_hold_skb.cold.29+0x17/0x1c
[    0.474857]  ? stop_machine_from_inactive_cpu+0x110/0x110
[    0.474858]  kauditd_send_queue+0x10f/0x150
[    0.474858]  kauditd_thread+0x236/0x2c0
[    0.474858]  ? wait_woken+0x80/0x80
[    0.474858]  ? auditd_reset+0x90/0x90
[    0.474859]  kthread+0x116/0x130
[    0.474859]  ? kthread_park+0x80/0x80
[    0.474859]  ret_from_fork+0x22/0x30
[    0.474860] Modules linked in:
[    0.474860] CR2: ffffffff96044553
[    0.474871] ---[ end trace 66ccf77300901455 ]---
[    0.474871] RIP: 0010:kvm_kick_cpu+0x23/0x30
[    0.474872] Code: 1f 84 00 00 00 00 00 66 66 66 66 90 48 63 ff 53
48 c7 c0 78 15 01 00 31 db 48 8b 14 fd 60 84 db 96 0f b7 0c 02 b8 05
00 00 00 <0f> 01 c1 5b c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48
89 fb
[    0.474873] RSP: 0000:ffffb1428013fbe8 EFLAGS: 00010046
[    0.474873] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000000
[    0.474874] RDX: ffff8ff0bfc00000 RSI: ffffffff97699600 RDI: 0000000000000000
[    0.474874] RBP: 0000000000000000 R08: 0000000000000100 R09: ffff8ff0bffd5000
[    0.474875] R10: 303034313d657079 R11: 74203a7469647561 R12: 0000000000000001
[    0.474875] R13: 0000000000000005 R14: 0000000000000000 R15: 00000000000000ed
[    0.474876] FS:  0000000000000000(0000) GS:ffff8ff0bfc80000(0000)
knlGS:0000000000000000
[    0.474876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.474877] CR2: ffffffff96044553 CR3: 0000000000c64000 CR4: 00000000003506a0
[    0.474877] Kernel panic - not syncing: Fatal exception
[    1.539900] Shutting down cpus with NMI
[    1.539902] Kernel Offset: 0x15000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)

So far, I made the following further observations:

kernel configuration needs to include PARAVIRT_SPINLOCKS = y; if set
to n, the kernel bug is not reproducible.

A kernel configured with make defconfig && make kvmconfig &&
./scripts/config -e PARAVIRT_SPINLOCKS does not show the kernel bug.
So, it requires some further configuration of the kernel configuration
above beyond what is in make defconfig && make kvmconfig &&
./scripts/config -e PARAVIRT_SPINLOCKS to actually trigger.

-smp option of qemu needs to be larger than 1 in the script above; it
is some kind of concurrency bug.

With the script above, the issue only occurs with `timeout
--foreground`, not with timeout (but maybe that is a completely
different issue I just misunderstand in the timeout and qemu
interaction).

Has this issue been observed by others?

Can somebody reproduce this issue here?

Are there certain kernel configurations I should try out to pinpoint
the issue to a certain set of kernel configurations?

Did I miss to mention some further important information for
reproducing and debugging?

Is there anything specific I can do to support debugging and fixing the issue?

Best regards,

Lukas