[Cc: KVM upstream list.] On Tue, Feb 06, 2018 at 04:11:46PM +0100, Florian Haas wrote: > Hi everyone, > > I hope this is the correct list to discuss this issue; please feel > free to redirect me otherwise. > > I have a nested virtualization setup that looks as follows: > > - Host: Ubuntu 16.04, kernel 4.4.0 (an OpenStack Nova compute node) > - L0 guest: openSUSE Leap 42.3, kernel 4.4.104-39-default > - Nested guest: SLES 12, kernel 3.12.28-4-default > > The nested guest is configured with "<type arch='x86_64' > machine='pc-i440fx-1.4'>hvm</type>". > > This is working just beautifully, except when the L0 guest wakes up > from managed save (openstack server resume in OpenStack parlance). > Then, in the L0 guest we immediately see this: [...] # Snip the call trace from Florian. It is here: https://www.redhat.com/archives/libvirt-users/2018-February/msg00014.html > What does fix things, of course, is to switch from the nested guest > from KVM to Qemu — but that also makes things significantly slower. > > So I'm wondering: is there someone reading this who does run nested > KVM and has managed to successfully live-migrate or managed-save? If > so, would you be able to share a working host kernel / L0 guest kernel > / nested guest kernel combination, or any other hints for tuning the > L0 guest to support managed save and live migration? Following up from our IRC discussion (on #kvm, Freenode). Re-posting my comment here: So I just did a test of 'managedsave' (which is just "save the state of the running VM to a file" in libvirt parlance) of L1, _while_ L2 is running, and I seem to reproduce your case (see the call trace attached). # Ensure L2 (the nested guest) is running on L1. Then, from L0, do # the following: [L0] $ virsh managedsave L1 [L0] $ virsh start L1 --console Result: See the call trace attached to this bug. But L1 goes on to start "fine", and L2 keeps running, too. But things start to seem weird. As in: I try to safely, read-only mount the L2 disk image via libguestfs (by setting export LIBGUESTFS_BACKEND=direct, which uses direct QEMU): `guestfish --ro -a -i ./cirros.qcow2`. It throws the call trace again on the L1 serial console. And the `guestfish` command just sits there forever - L0 (bare metal) Kernel: 4.13.13-300.fc27.x86_64+debug - L1 (guest hypervisor) kernel: 4.11.10-300.fc26.x86_64 - L2 is a CirrOS 3.5 image I can reproduce this at least 3 times, with the above versions. I'm using libvirt 'host-passthrough' for CPU (meaning: '-cpu host' in QEMU parlance) for both L1 and L2. My L0 CPU is: Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz. Thoughts? --- [/me wonders if I'll be asked to reproduce this with newest upstream kernels.] [...] -- /kashyap
$> virsh start f26-devstack --console Domain f26-devstack started Connected to domain f26-devstack Escape character is ^] [ 1323.605321] ------------[ cut here ]------------ [ 1323.608653] kernel BUG at arch/x86/kvm/x86.c:336! [ 1323.611661] invalid opcode: 0000 [#1] SMP [ 1323.614221] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables sb_edac edac_core kvm_intel openvswitch nf_conntrack_ipv6 kvm nf_nat_ipv6 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack irqbypass cr ct10dif_pclmul sunrpc crc32_pclmul ppdev ghash_clmulni_intel parport_pc joydev virtio_net virtio_balloon parport tpm_tis i2c_piix4 tpm_tis_core tpm xfs libcrc32c virtio_blk virtio_console vi rtio_rng crc32c_intel serio_raw virtio_pci ata_generic virtio_ring virtio pata_acpi qemu_fw_cfg [ 1323.645674] CPU: 0 PID: 18587 Comm: CPU 0/KVM Not tainted 4.11.10-300.fc26.x86_64 #1 [ 1323.649592] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-1.fc27 04/01/2014 [ 1323.653935] task: ffff8b5be13ca580 task.stack: ffffa8b78147c000 [ 1323.656783] RIP: 0010:kvm_spurious_fault+0x9/0x10 [kvm] [ 1323.659317] RSP: 0018:ffffa8b78147fc78 EFLAGS: 00010246 [ 1323.661808] RAX: 0000000000000000 RBX: ffff8b5be13c0000 RCX: 0000000000000000 [ 1323.665077] RDX: 0000000000006820 RSI: 0000000000000292 RDI: ffff8b5be13c0000 [ 1323.668287] RBP: ffffa8b78147fc78 R08: ffff8b5be13c0090 R09: 0000000000000000 [ 1323.671515] R10: ffffa8b78147fbf8 R11: 0000000000000000 R12: ffff8b5be13c0088 [ 1323.674598] R13: 0000000000000001 R14: 00000131e2372ee6 R15: ffff8b5be1360040 [ 1323.677643] FS: 00007fd602aff700(0000) GS:ffff8b5bffc00000(0000) knlGS:0000000000000000 [ 1323.681130] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1323.683628] CR2: 000055d650532c20 CR3: 0000000221260000 CR4: 00000000001426f0 [ 1323.686697] Call Trace: [ 1323.687817] intel_pmu_get_msr+0xd23/0x3f44 [kvm_intel] [ 1323.690151] ? vmx_interrupt_allowed+0x19/0x40 [kvm_intel] [ 1323.692583] kvm_arch_vcpu_runnable+0xa5/0xe0 [kvm] [ 1323.694767] kvm_vcpu_check_block+0x12/0x50 [kvm] [ 1323.696858] kvm_vcpu_block+0xa3/0x2f0 [kvm] [ 1323.698762] kvm_arch_vcpu_ioctl_run+0x165/0x16a0 [kvm] [ 1323.701079] ? kvm_arch_vcpu_load+0x6d/0x290 [kvm] [ 1323.703175] ? __check_object_size+0xbb/0x1b3 [ 1323.705109] kvm_vcpu_ioctl+0x2a6/0x620 [kvm] [ 1323.707021] ? kvm_vcpu_ioctl+0x2a6/0x620 [kvm] [ 1323.709006] do_vfs_ioctl+0xa5/0x600 [ 1323.710570] SyS_ioctl+0x79/0x90 [ 1323.712011] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 1323.714033] RIP: 0033:0x7fd610fb35e7 [ 1323.715601] RSP: 002b:00007fd602afe7c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1323.718869] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd610fb35e7 [ 1323.721972] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000013 [ 1323.725044] RBP: 0000563dab190300 R08: 0000563dab1ab7d0 R09: 01fc2de3f821e99c [ 1323.728124] R10: 000000003b9aca00 R11: 0000000000000246 R12: 0000563dadce20a6 [ 1323.731195] R13: 0000000000000000 R14: 00007fd61a84c000 R15: 0000563dadce2000 [ 1323.734268] Code: 8d 00 00 01 c7 05 1c e6 05 00 01 00 00 00 41 bd 01 00 00 00 44 8b 25 2f e6 05 00 e9 db fe ff ff 66 90 0f 1f 44 00 00 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 89 ff 48 89 e5 41 54 53 [ 1323.742385] RIP: kvm_spurious_fault+0x9/0x10 [kvm] RSP: ffffa8b78147fc78 [ 1323.745438] ---[ end trace 92fa23c974db8b7e ]---