On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote: > Hi all, > > it seems the "new" KVM_SET_NESTED_STATE interface has some remaining > robustness issues. I would be very interested to learn about any more robustness issues that you are seeing. > The most urgent one: With the help of latest QEMU > master that uses this interface, you can easily crash the host. You just > need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1. > The host CPU that ran this will stall, the system will freeze soon. Just to confirm, you start an L2 guest using qemu inside an L1-guest and then hard-reset the L1 guest? Are you running any special workload in L2 or L1 when you reset? Also how exactly are you doing this "hard reset"? (sorry just tried this in my setup and I did not see any problem but my setup is slightly different, so just ruling out obvious stuff). > > I've also seen a pattern with my Jailhouse test VM where I seems to get > stuck in a loop between L1 and L2: > > qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 > qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload > qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2) > qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load > qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 > qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync > qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3 > qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 > qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 > qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload > qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2) > qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load > qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 > qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync > qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3 > qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 > qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 > > These issues disappear when going from ebbfef2f back to 6cfd7639 (both > with build fixes) in QEMU. This is the QEMU that you are using in L0 to launch an L1 guest, right? or are you still referring to the QEMU mentioned above? > Host kernels tested: 5.1.16 (distro) and 5.2 (vanilla). > Jan > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879