Heya, This is on Intel Haswell. First, some version info: L0, L1 -- both of them have same versions of kernel, qemu: ===== $ rpm -q kernel --changelog | head -2 * Thu May 09 2013 Josh Boyer - 3.10.0-0.rc0.git23.1 - Linux v3.9-11789-ge0fd9af ===== ===== $ uname -r ; rpm -q qemu-kvm libvirt-daemon-kvm libguestfs 3.10.0-0.rc0.git23.1.fc20.x86_64 qemu-kvm-1.4.1-1.fc19.x86_64 libvirt-daemon-kvm-1.0.5-2.fc19.x86_64 libguestfs-1.21.35-1.fc19.x86_64 ===== Additionally, neither nmi_watchdog, nor hpet enabled on L0 & L1 kernels: ===== $ egrep -i 'nmi|hpet' /etc/grub2.cfg $ ===== KVM parameters on L0 : ===== $ cat /sys/module/kvm_intel/parameters/nested Y $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs Y $ cat /sys/module/kvm_intel/parameters/enable_apicv N $ cat /sys/module/kvm_intel/parameters/ept Y ===== -> That's the stack trace I'm seeing, when I start the L2 guest: ------------------------------------------------ ....... [ 2.162235] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) [ 2.163080] Pid: 1, comm: swapper/0 Not tainted 3.8.11-200.fc18.x86_64 #1 [ 2.163080] Call Trace: [ 2.163080] [<ffffffff81649c19>] panic+0xc1/0x1d0 [ 2.163080] [<ffffffff81d010e0>] mount_block_root+0x1fa/0x2ac [ 2.163080] [<ffffffff81d011e9>] mount_root+0x57/0x5b [ 2.163080] [<ffffffff81d0132a>] prepare_namespace+0x13d/0x176 [ 2.163080] [<ffffffff81d00e1c>] kernel_init_freeable+0x1cf/0x1da [ 2.163080] [<ffffffff81d00610>] ? do_early_param+0x8c/0x8c [ 2.163080] [<ffffffff81637ca0>] ? rest_init+0x80/0x80 [ 2.163080] [<ffffffff81637cae>] kernel_init+0xe/0xf0 [ 2.163080] [<ffffffff8165bd6c>] ret_from_fork+0x7c/0xb0 [ 2.163080] [<ffffffff81637ca0>] ? rest_init+0x80/0x80 [ 2.163080] Uhhuh. NMI received for unknown reason 30 on CPU 0. [ 2.163080] Do you have a strange power saving mode enabled? [ 2.163080] Dazed and confused, but trying to continue [ 2.163080] Uhhuh. NMI received for unknown reason 20 on CPU 0. [ 2.163080] Do you have a strange power saving mode enabled? [ 2.163080] Dazed and confused, but trying to continue [ 2.163080] Uhhuh. NMI received for unknown reason 30 on CPU 0. ------------------------------------------------ I'm able to reproduce to reproduce this consistently. L1 QEMU command-line: ==================== $ ps -ef | grep -i qemu qemu 4962 1 21 15:41 ? 00:00:41 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name regular-guest -S -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu Haswell,+vmx -m 6144 -smp 4,sockets=4,cores=1,threads=1 -uuid 4ed9ac0b-7f72-dfcf-68b3-e6fe2ac588b2 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/test/vmimages/regular-guest.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:80:c1:34,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 L2 QEMU command-line: ==================== $ qemu 2042 1 0 May09 ? 00:05:03 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name nested-guest -S -machine pc-i440fx-1.4,accel=kvm,usb=off -m 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid 02ea8988-1054-b08b-bafe-cfbe9659976c -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/nested-guest.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/test/vmimages/nested-guest.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:65:c4:e6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 I attached the vmxcap script output. Before I debug further, anyone has hints here? Many thanks in advance. [1] Notes -- https://github.com/kashyapc/nested-virt-notes-intel-f18 /kashyap
Basic VMX Information Revision 18 VMCS size 1024 VMCS restricted to 32 bit addresses no Dual-monitor support yes VMCS memory type 6 INS/OUTS instruction information yes IA32_VMX_TRUE_*_CTLS support yes pin-based controls External interrupt exiting yes NMI exiting yes Virtual NMIs yes Activate VMX-preemption timer yes Process posted interrupts no primary processor-based controls Interrupt window exiting yes Use TSC offsetting yes HLT exiting yes INVLPG exiting yes MWAIT exiting yes RDPMC exiting yes RDTSC exiting yes CR3-load exiting default CR3-store exiting default CR8-load exiting yes CR8-store exiting yes Use TPR shadow yes NMI-window exiting yes MOV-DR exiting yes Unconditional I/O exiting yes Use I/O bitmaps yes Monitor trap flag yes Use MSR bitmaps yes MONITOR exiting yes PAUSE exiting yes Activate secondary control yes secondary processor-based controls Virtualize APIC accesses yes Enable EPT yes Descriptor-table exiting yes Enable RDTSCP yes Virtualize x2APIC mode yes Enable VPID yes WBINVD exiting yes Unrestricted guest yes APIC register emulation no Virtual interrupt delivery no PAUSE-loop exiting yes RDRAND exiting yes Enable INVPCID yes Enable VM functions yes VMCS shadowing yes EPT-violation #VE no VM-Exit controls Save debug controls default Host address-space size yes Load IA32_PERF_GLOBAL_CTRL yes Acknowledge interrupt on exit yes Save IA32_PAT yes Load IA32_PAT yes Save IA32_EFER yes Load IA32_EFER yes Save VMX-preemption timer value yes VM-Entry controls Load debug controls default IA-64 mode guest yes Entry to SMM yes Deactivate dual-monitor treatment yes Load IA32_PERF_GLOBAL_CTRL yes Load IA32_PAT yes Load IA32_EFER yes Miscellaneous data VMX-preemption timer scale (log2) 5 Store EFER.LMA into IA-32e mode guest control yes HLT activity state yes Shutdown activity state yes Wait-for-SIPI activity state yes IA32_SMBASE support yes Number of CR3-target values 4 MSR-load/store count recommenation 0 IA32_SMM_MONITOR_CTL[2] can be set to 1 yes VMWRITE to VM-exit information fields yes MSEG revision identifier 0 VPID and EPT capabilities Execute-only EPT translations yes Page-walk length 4 yes Paging-structure memory type UC yes Paging-structure memory type WB yes 2MB EPT pages yes 1GB EPT pages yes INVEPT supported yes EPT accessed and dirty flags yes Single-context INVEPT yes All-context INVEPT yes INVVPID supported yes Individual-address INVVPID yes Single-context INVVPID yes All-context INVVPID yes Single-context-retaining-globals INVVPID yes VM Functions EPTP Switching yes