Re: KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/26/23 12:17, Sean Christopherson wrote:
If so, what fields in the kvm_run struct should I check that could cause such
an issue?

Heh, all of them.  I'm only somewhat joking.  Root causing "invalid control field"
errors on bare metal is painfully difficult, bordering on impossible if you don't
have something to give you a hint as to what might be going wrong.

I suppose that's what I was expecting, but was hoping it could be narrowed down a bit. Could the values of the CPU control registers or other special registers set with KVM_SET_SREGS also cause this error (with hardware_entry_failure_reason = 7)? I'd expect this not to be possible because I don't think the CPU registers are part of the VMCS, but I'm not very familiar with VMX.

I do know that the emulator I'm copying state from likely doesn't consider all bits in the control fields, so it's possible that they're in an invalid state. When I ran the model before with the value for cr0 copied out of the emulator I also got KVM_EXIT_FAIL_ENTRY, but with a different value for hardware_entry_failure_reason = 0x80000021. I fixed this by changing the value of cr0 to be (hopefully) valid.

If you can, try running a nested setup, i.e. run a normal Linux guest as your L1
VM (L0 is bare metal), and then run your problematic x86 emulator VM within that
L1 guest (that's your L2).  Then, in L0 (your bare metal host), enable the
kvm_nested_vmenter_failed tracepoint.

The kvm_nested_vmenter_failed tracepoint logs all VM-Enter failures that _KVM_
detects when L1 attempts a nested VM-Enter from L1 to L2.  If you're at all lucky,
KVM in L0 (acting a the CPU from L1's perspective) will detect the invalid state
and explicitly log which consistency check failed.

I did this and had an interesting result. Instead of exiting with KVM_EXIT_FAIL_ENTRY, it exited with KVM_EXIT_UNkNOWN, and hardware_exit_reason = 0. I also didn't get anything logged from the kvm_nested_vmenter_failed trace point. When I checked the value of rip after KVM_RUN, it was the same as the starting value, so it probably failed without executing any instructions.

I then tried setting the kvm_nested_vmexit tracepoint to see if I could get any more information about the vmexit. When the vmexit occurred, I got a line in the log that looked like this:

CPU 3/KVM-9310 [013] .... 6076.453278: kvm_nested_vmexit: vcpu 3 reason EPT_VIOLATION rip 0x103c00 info1 0x0000000000000781 info2 0x000000008000030d intr_info 0x00000000 error_code 0x00000000

It appears this occurred due to an EPT_VIOLATION. I have some questions:
I believe an EPT_VIOLATION is caused by trying to access physical memory that is not mapped. Is that correct? Also, could this be the same error that causes the KVM_EXIT_FAIL_ENTRY when running the VM as L1, or must that be a separate issue?

I know that the paging code of the emulator the state is from is a little suspect (in fact, one of my reasons to get this VM working in KVM is to help debug the emulator), and it is possible that the page tables of the VM are not setup properly and are mapping linear addresses to unexpected physical addresses and causing an EPT_VIOLATION. I'll have to look into that further.

Thanks for the help,
Yahya Sohail



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux