Re: [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization

Jan Kiszka <jan.kiszka@xxxxxx> · Tue, 16 Apr 2013 09:03:07 +0200

On 2013-04-16 05:49, 李春奇 <Arthur Chunqi Li> wrote:
> I changed to the latest version of kvm kernel but the bug also occured.
> 
> On the startup of L1 VM on the host, the host kern.log will output:
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0
> unhandled rdmsr: 0x345
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22
> callbacks suppressed
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0
> unhandled wrmsr: 0x40 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0
> unhandled wrmsr: 0x60 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0
> unhandled wrmsr: 0x41 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0
> unhandled wrmsr: 0x61 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0
> unhandled wrmsr: 0x42 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0
> unhandled wrmsr: 0x62 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0
> unhandled wrmsr: 0x43 data 0
> Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0
> unhandled wrmsr: 0x63 data 0
> Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1
> unhandled wrmsr: 0x40 data 0
> Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1
> unhandled wrmsr: 0x60 data 0
> 
> When L1 VM starts and crashes, its kern.log will output:
> Apr 16 11:28:55 kvm1 kernel: [   33.590101] device tap0 entered promiscuous
> mode
> Apr 16 11:28:55 kvm1 kernel: [   33.590140] br0: port 2(tap0) entered
> forwarding state
> Apr 16 11:28:55 kvm1 kernel: [   33.590146] br0: port 2(tap0) entered
> forwarding state
> Apr 16 11:29:04 kvm1 kernel: [   42.592103] br0: port 2(tap0) entered
> forwarding state
> Apr 16 11:29:19 kvm1 kernel: [   57.752731] kvm [1673]: vcpu0 unhandled
> rdmsr: 0x345
> Apr 16 11:29:19 kvm1 kernel: [   57.797261] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x40 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797315] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x60 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797366] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x41 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797416] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x61 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797466] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x42 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797516] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x62 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797566] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x43 data 0
> Apr 16 11:29:19 kvm1 kernel: [   57.797616] kvm [1673]: vcpu0 unhandled
> wrmsr: 0x63 data 0
> 
> The host will output simultaneously:
> Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS
> MSR_{LOAD,STORE} unsupported

That's an important information. KVM is not yet implementing this
feature, but L1 is using it - doomed to fail. This feature gap of nested
VMX needs to be closed at some point.

> 
> And the callback trace displayed on the console is the same as the previous
> mail.
> 
> Besides, the L1 and L2 guest may sometimes crash and output nothing, while
> sometimes it will output as above.
> 
> 
> So this indicates that the msr controls may fail for core2duo CPU emulator.
> 

Maybe varying the CPU type (try e.g. -cpu kvm64,+vmx) reduces the
likeliness of this scenario with KVM as guest.

> 
> For Jan,
> I have traced the code of qemu and KVM and found the relevant code of errno
> "KVM: entry failed, hardware error 0x7". The relevant code is in kernel
> arch/x86/kvm/vmx.c, function vmx_handle_exit():
> 
> if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
> vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> vcpu->run->fail_entry.hardware_entry_failure_reason
> = exit_reason;
> return 0;
> }
> 
> if (unlikely(vmx->fail)) {
> vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> vcpu->run->fail_entry.hardware_entry_failure_reason
> = vmcs_read32(VM_INSTRUCTION_ERROR);
> return 0;
> }
> 
> The entry failed hardware error may be caused from these two points, both
> are caused by VMENTRY failed. Because macro VMX_EXIT_REASONS_FAILED_VMENTRY
> is 0x80000000 and the output errno is 0x7, so this error is caused by the
> second branch. I'm not very clear what the result of
> vmcs_read32(VM_INSTRUCTION_ERROR) refers to.

Try to look this up in the Intel manual. It explains what instruction
error 7 means. You will also find it when tracing down the error message
of L0.

Jan

Attachment:
signature.asc

Description: OpenPGP digital signature