On 5/17/24 13:37, Liang Chen wrote:
The attached cleaned up reproducer shows that the problem is simply that
EFLAGS.VM is set in 64-bit mode. To fix it, it should be enough to do
a nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0); just like
a few lines below.
Yes, that was the situation we were trying to deal with. However, I am
not quite sure if I fully understand the suggestion, "To fix it, it
should be enough to do a nested_vmx_vmexit(vcpu,
EXIT_REASON_TRIPLE_FAULT, 0, 0); just like a few lines below.". From
what I see, "(vmx->nested.nested_run_pending, vcpu->kvm) == true" in
__vmx_handle_exit can be a result of an invalid VMCS12 from L1 that
somehow escaped checking when trapped into L0 in nested_vmx_run. It is
not convenient to tell whether it was a result of userspace
register_set ops, as we are discussing, or an invalid VMCS12 supplied
by L1.
Right, KVM assumes that it can delegate the "Checks on Guest Segment
Registers" to the processor if a field is copied straight from VMCS12 to
VMCS02. In this case the segments are not set up for virtual-8086 mode;
interestingly the manual seems to say that EFLAGS.VM wins over "IA-32e
mode guest" is 1 for the purpose of checking guest state. AMD's manual
says that EFLAGS.VM is completely ignored in 64-bit mode instead.
I need to look more at the sequence of VMLAUNCH/RESUME, KVM_SET_MSR and
the failed vmentry to understand exactly what the right fix is.
Paolo
Additionally, nested_vmx_vmexit warns when
'vmx->nested.nested_run_pending is true,' saying that "trying to
cancel vmlaunch/vmresume is a bug".