On 2013-04-16 05:49, 李春奇 <Arthur Chunqi Li> wrote: > I changed to the latest version of kvm kernel but the bug also occured. > > On the startup of L1 VM on the host, the host kern.log will output: > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0 > unhandled rdmsr: 0x345 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22 > callbacks suppressed > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0 > unhandled wrmsr: 0x40 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0 > unhandled wrmsr: 0x60 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0 > unhandled wrmsr: 0x41 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0 > unhandled wrmsr: 0x61 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0 > unhandled wrmsr: 0x42 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0 > unhandled wrmsr: 0x62 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0 > unhandled wrmsr: 0x43 data 0 > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0 > unhandled wrmsr: 0x63 data 0 > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1 > unhandled wrmsr: 0x40 data 0 > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1 > unhandled wrmsr: 0x60 data 0 > > When L1 VM starts and crashes, its kern.log will output: > Apr 16 11:28:55 kvm1 kernel: [ 33.590101] device tap0 entered promiscuous > mode > Apr 16 11:28:55 kvm1 kernel: [ 33.590140] br0: port 2(tap0) entered > forwarding state > Apr 16 11:28:55 kvm1 kernel: [ 33.590146] br0: port 2(tap0) entered > forwarding state > Apr 16 11:29:04 kvm1 kernel: [ 42.592103] br0: port 2(tap0) entered > forwarding state > Apr 16 11:29:19 kvm1 kernel: [ 57.752731] kvm [1673]: vcpu0 unhandled > rdmsr: 0x345 > Apr 16 11:29:19 kvm1 kernel: [ 57.797261] kvm [1673]: vcpu0 unhandled > wrmsr: 0x40 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797315] kvm [1673]: vcpu0 unhandled > wrmsr: 0x60 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797366] kvm [1673]: vcpu0 unhandled > wrmsr: 0x41 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797416] kvm [1673]: vcpu0 unhandled > wrmsr: 0x61 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797466] kvm [1673]: vcpu0 unhandled > wrmsr: 0x42 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797516] kvm [1673]: vcpu0 unhandled > wrmsr: 0x62 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797566] kvm [1673]: vcpu0 unhandled > wrmsr: 0x43 data 0 > Apr 16 11:29:19 kvm1 kernel: [ 57.797616] kvm [1673]: vcpu0 unhandled > wrmsr: 0x63 data 0 > > The host will output simultaneously: > Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS > MSR_{LOAD,STORE} unsupported That's an important information. KVM is not yet implementing this feature, but L1 is using it - doomed to fail. This feature gap of nested VMX needs to be closed at some point. > > And the callback trace displayed on the console is the same as the previous > mail. > > Besides, the L1 and L2 guest may sometimes crash and output nothing, while > sometimes it will output as above. > > > So this indicates that the msr controls may fail for core2duo CPU emulator. > Maybe varying the CPU type (try e.g. -cpu kvm64,+vmx) reduces the likeliness of this scenario with KVM as guest. > > For Jan, > I have traced the code of qemu and KVM and found the relevant code of errno > "KVM: entry failed, hardware error 0x7". The relevant code is in kernel > arch/x86/kvm/vmx.c, function vmx_handle_exit(): > > if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) { > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; > vcpu->run->fail_entry.hardware_entry_failure_reason > = exit_reason; > return 0; > } > > if (unlikely(vmx->fail)) { > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; > vcpu->run->fail_entry.hardware_entry_failure_reason > = vmcs_read32(VM_INSTRUCTION_ERROR); > return 0; > } > > The entry failed hardware error may be caused from these two points, both > are caused by VMENTRY failed. Because macro VMX_EXIT_REASONS_FAILED_VMENTRY > is 0x80000000 and the output errno is 0x7, so this error is caused by the > second branch. I'm not very clear what the result of > vmcs_read32(VM_INSTRUCTION_ERROR) refers to. Try to look this up in the Intel manual. It explains what instruction error 7 means. You will also find it when tracing down the error message of L0. Jan
Attachment:
signature.asc
Description: OpenPGP digital signature