Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Florian Haas <florian@xxxxxxxxxxx> · Tue, 6 Feb 2018 16:11:46 +0100

Hi everyone,

I hope this is the correct list to discuss this issue; please feel
free to redirect me otherwise.

I have a nested virtualization setup that looks as follows:

- Host: Ubuntu 16.04, kernel 4.4.0 (an OpenStack Nova compute node)
- L0 guest: openSUSE Leap 42.3, kernel 4.4.104-39-default
- Nested guest: SLES 12, kernel 3.12.28-4-default

The nested guest is configured with "<type arch='x86_64'
machine='pc-i440fx-1.4'>hvm</type>".

This is working just beautifully, except when the L0 guest wakes up
from managed save (openstack server resume in OpenStack parlance).
Then, in the L0 guest we immediately see this:

[Tue Feb  6 07:00:37 2018] ------------[ cut here ]------------
[Tue Feb  6 07:00:37 2018] kernel BUG at ../arch/x86/kvm/x86.c:328!
[Tue Feb  6 07:00:37 2018] invalid opcode: 0000 [#1] SMP
[Tue Feb  6 07:00:37 2018] Modules linked in: fuse vhost_net vhost
macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp tun br_netfilter bridge stp llc
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
ip_tables x_tables vboxpci(O) vboxnetadp(O) vboxnetflt(O) af_packet
iscsi_ibft iscsi_boot_sysfs vboxdrv(O) kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
hid_generic usbhid jitterentropy_rng drbg ansi_cprng ppdev parport_pc
floppy parport joydev aesni_intel processor button aes_x86_64
virtio_balloon virtio_net lrw gf128mul glue_helper pcspkr serio_raw
ablk_helper cryptd i2c_piix4 ext4 crc16 jbd2 mbcache ata_generic
[Tue Feb  6 07:00:37 2018]  virtio_blk ata_piix ahci libahci cirrus(O)
drm_kms_helper(O) syscopyarea sysfillrect sysimgblt fb_sys_fops ttm(O)
drm(O) virtio_pci virtio_ring virtio uhci_hcd ehci_hcd usbcore
usb_common libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc
scsi_dh_alua scsi_mod autofs4
[Tue Feb  6 07:00:37 2018] CPU: 2 PID: 2041 Comm: CPU 0/KVM Tainted: G
       W  O     4.4.104-39-default #1
[Tue Feb  6 07:00:37 2018] Hardware name: OpenStack Foundation
OpenStack Nova, BIOS 1.10.1-1ubuntu1~cloud0 04/01/2014
[Tue Feb  6 07:00:37 2018] task: ffff880037108d80 ti: ffff88042e964000
task.ti: ffff88042e964000
[Tue Feb  6 07:00:37 2018] RIP: 0010:[<ffffffffa04f20e5>]
[<ffffffffa04f20e5>] kvm_spurious_fault+0x5/0x10 [kvm]
[Tue Feb  6 07:00:37 2018] RSP: 0018:ffff88042e967d70  EFLAGS: 00010246
[Tue Feb  6 07:00:37 2018] RAX: 0000000000000000 RBX: ffff88042c4f0040
RCX: 0000000000000000
[Tue Feb  6 07:00:37 2018] RDX: 0000000000006820 RSI: 0000000000000282
RDI: ffff88042c4f0040
[Tue Feb  6 07:00:37 2018] RBP: ffff88042c4f00d8 R08: ffff88042e964000
R09: 0000000000000002
[Tue Feb  6 07:00:37 2018] R10: 0000000000000004 R11: 0000000000000000
R12: 0000000000000001
[Tue Feb  6 07:00:37 2018] R13: 0000021d34fbb21d R14: 0000000000000001
R15: 000055d2157cf840
[Tue Feb  6 07:00:37 2018] FS:  00007f7c52b96700(0000)
GS:ffff88043fd00000(0000) knlGS:0000000000000000
[Tue Feb  6 07:00:37 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Feb  6 07:00:37 2018] CR2: 00007f823b15f000 CR3: 0000000429334000
CR4: 0000000000362670
[Tue Feb  6 07:00:37 2018] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[Tue Feb  6 07:00:37 2018] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[Tue Feb  6 07:00:37 2018] Stack:
[Tue Feb  6 07:00:37 2018]  ffffffffa07939b1 ffffffffa0787875
ffffffffa0503a60 ffff88042c4f0040
[Tue Feb  6 07:00:37 2018]  ffffffffa04e5ede ffff88042c4f0040
ffffffffa04e6f0f ffff880037108d80
[Tue Feb  6 07:00:37 2018]  ffff88042c4f00e0 ffff88042c4f00e0
ffff88042c4f0040 ffff88042e968000
[Tue Feb  6 07:00:37 2018] Call Trace:
[Tue Feb  6 07:00:37 2018]  [<ffffffffa07939b1>]
intel_pmu_set_msr+0xfc1/0x2341 [kvm_intel]
[Tue Feb  6 07:00:37 2018] DWARF2 unwinder stuck at
intel_pmu_set_msr+0xfc1/0x2341 [kvm_intel]
[Tue Feb  6 07:00:37 2018] Leftover inexact backtrace:
[Tue Feb  6 07:00:37 2018]  [<ffffffffa0787875>] ?
vmx_interrupt_allowed+0x15/0x30 [kvm_intel]
[Tue Feb  6 07:00:37 2018]  [<ffffffffa0503a60>] ?
kvm_arch_vcpu_runnable+0xa0/0xd0 [kvm]
[Tue Feb  6 07:00:37 2018]  [<ffffffffa04e5ede>] ?
kvm_vcpu_check_block+0xe/0x60 [kvm]
[Tue Feb  6 07:00:37 2018]  [<ffffffffa04e6f0f>] ?
kvm_vcpu_block+0x8f/0x310 [kvm]
[Tue Feb  6 07:00:37 2018]  [<ffffffffa0503c17>] ?
kvm_arch_vcpu_ioctl_run+0x187/0x400 [kvm]
[Tue Feb  6 07:00:37 2018]  [<ffffffffa04ea6d9>] ?
kvm_vcpu_ioctl+0x359/0x680 [kvm]
[Tue Feb  6 07:00:37 2018]  [<ffffffff81016689>] ? __switch_to+0x1c9/0x460
[Tue Feb  6 07:00:37 2018]  [<ffffffff81224f02>] ? do_vfs_ioctl+0x322/0x5d0
[Tue Feb  6 07:00:37 2018]  [<ffffffff811362ef>] ?
__audit_syscall_entry+0xaf/0x100
[Tue Feb  6 07:00:37 2018]  [<ffffffff8100383b>] ?
syscall_trace_enter_phase1+0x15b/0x170
[Tue Feb  6 07:00:37 2018]  [<ffffffff81225224>] ? SyS_ioctl+0x74/0x80
[Tue Feb  6 07:00:37 2018]  [<ffffffff81634a02>] ?
entry_SYSCALL_64_fastpath+0x16/0xae
[Tue Feb  6 07:00:37 2018] Code: d7 fe ff ff 8b 2d 04 6e 06 00 e9 c2
fe ff ff 48 89 f2 e9 65 ff ff ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00
00 00 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 55 89 ff 48 89
[Tue Feb  6 07:00:37 2018] RIP  [<ffffffffa04f20e5>]
kvm_spurious_fault+0x5/0x10 [kvm]
[Tue Feb  6 07:00:37 2018]  RSP <ffff88042e967d70>
[Tue Feb  6 07:00:37 2018] ---[ end trace e15c567f77920049 ]---

We only hit this kernel bug if we have a nested VM running. The exact
same setup, sent into managed save after shutting down the nested VM,
wakes up just fine.

Now I am aware of https://bugzilla.redhat.com/show_bug.cgi?id=1076294,
which talks about live migration — but I think the same considerations
apply.

I am also aware of
https://fedoraproject.org/wiki/How_to_enable_nested_virtualization_in_KVM,
which strongly suggests to use host-passthrough or host-model. I have
tried both, to no avail. The stack trace persists. I have also tried
running a 4.15 kernel in the L0 guest, from
https://kernel.opensuse.org/packages/stable, but again, the stack
trace persists.

What does fix things, of course, is to switch from the nested guest
from KVM to Qemu — but that also makes things significantly slower.

So I'm wondering: is there someone reading this who does run nested
KVM and has managed to successfully live-migrate or managed-save? If
so, would you be able to share a working host kernel / L0 guest kernel
/ nested guest kernel combination, or any other hints for tuning the
L0 guest to support managed save and live migration?

I'd be extraordinarily grateful for any suggestions. Thanks!

Cheers,
Florian

_______________________________________________
libvirt-users mailing list
libvirt-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvirt-users