> On Thu, Aug 24, 2023, Weijiang Yang wrote: > > On 8/4/2023 1:11 AM, Sean Christopherson wrote: > > > On Thu, Aug 03, 2023, Weijiang Yang wrote: > > > > > This is wrong, no? The consistency check is only skipped for > > > > > PM, the above CR0.PE modification means the target is RM. > > > > I think this case is executed with !CPU_URG, so RM is "converted" > > > > to PM because we have below in KVM: > > > > bool urg = nested_cpu_has2(vmcs12, > > > > SECONDARY_EXEC_UNRESTRICTED_GUEST); > > > > bool prot_mode = !urg || vmcs12->guest_cr0 & > > > > X86_CR0_PE; ... > > > > if (!prot_mode || intr_type != > > > > INTR_TYPE_HARD_EXCEPTION || > > > > !nested_cpu_has_no_hw_errcode(vcpu)) { > > > > /* VM-entry interruption-info field: > > > > deliver error code */ > > > > should_have_error_code = > > > > intr_type == > > > > INTR_TYPE_HARD_EXCEPTION && > > > > prot_mode && > > > > x86_exception_has_error_code(vector); > > > > if (CC(has_error_code != > > > > should_have_error_code)) > > > > return -EINVAL; > > > > } > > > > > > > > so on platform with basic.errcode == 1, this case passes. > > > Huh. I get the logic, but IMO based on the SDM, that's a ucode bug > > > that got propagated into KVM (or an SDM bug, which is my bet for how > this gets treated). > > > > > > I verified HSW at least does indeed generate VM-Fail and not > > > VM-Exit(INVALID_STATE), so it doesn't appear that KVM is making > > > stuff (for once). Either that or I'm misreading the SDM (definite > possibility), but the only relevant condition I see is: > > > > > > bit 0 (corresponding to CR0.PE) is set in the CR0 field in the > > > guest-state area > > > > > > I don't see anything in the SDM that states the CR0.PE is assumed to > > > be '1' for consistency checks when unrestricted guest is disabled. > > > > > > Can you bug a VMX architect again to get clarification, e.g. to get an > SDM update? > > > Or just point out where I missed something in the SDM, again... > > > > Sorry for the delayed response! Also added Gil in cc. > > Hey Gil! Thanks for humoring me again. > > > > > I got reply from Gil as below: > > > > "I am not sure whether you (or Sean) are referring to guest state or > host state. > > The question is whether trying to do VMLAUNCH/VMRESUME with this scenario > > 1. unrestricted guest disabled > 2. GUEST_CR0.PE = 0 > 3. #GP injection _without_ an error code > > should VM-Fail due injecting a #GP without an error code, or VM- > Exit(INVALID_STATE) due to CR0.PE=0 without unrestricted guest support. > > Hardware (I personally tested on Haswell) signals VM-Fail, which doesn't > match what's in the SDM: > > The field's deliver-error-code bit (bit 11) is 1 if each of the > following holds: > > (1) the interruption type is hardware exception; > (2) bit 0 (corresponding to CR0.PE) is set in the CR0 field in the > guest-state area; > (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the > vector indicates > one of the following exceptions: #DF (vector 8), #TS (10), #NP > (11), #SS (12), > #GP (13), #PF (14), or #AC (17). > > Specifically #2 doesn't say anything about the check treating GUEST_CR0.PE > as '1' > if unrestricted guest is disabled. Thanks for clarifying that the case in question include injection of an event with error code. This is a quirky situation, and it happens (coincidentally) that I am working similar questions right now internally. In general, VM entry fails "early" (VM-Fail) if there is a problem with host state or controls and fails "late" (VM-Exit(INVALID_STATE)) if it doesn't fail early but there is a problem with guest state. This distinction exists to allow the CPU to load and check guest state in one pass: once VM entry starts to load state, any failure will reload registers with "VM-Exit(INVALID_STATE)". The original checks on the injection controls (including the error code bit) were all done early as they were just checks on controls. At that time, there was no conditioning on CR0.PE because "unrestricted guest" did not exist yet. When "unrestricted guest" was added, those checks became conditional on the guest value of CR0.PE. We consider the possibility of moving those checks to be "late" (because they depend on guest state), but that would have required more changes to the VM-entry implementation than seemed justified. Instead, the checks were left "early". (The guest CR0.PE was consulted, but it was not actually loaded into the CR0 registers, so VM-Fail was OK.) That has left this architectural anomaly that we have one early check that depends on guest state - and it is guest state that may later cause a late check to fail. As I said, I am working this issue internally to Intel so that everyone has a consistent view of how this all should work. I will follow up (or work through Weijiang) as things develop. - Gil