Intel will not produce any CPU with CET that does not enumerate IA32_VMX_BASIC[56] as 1. One can check that bit before injecting a #CP with error code, but it should not be necessary if CET is enumerated. Of course, if the KVM may run as a guest of another VMM/hypervisor, it may be that the virtual CPU in which KVM operates may enumerate CET but clear the bit in IA32_VMX_BASIC. - Gil -----Original Message----- From: Yang, Weijiang <weijiang.yang@xxxxxxxxx> Sent: Friday, June 30, 2023 05:05 To: Gao, Chao <chao.gao@xxxxxxxxx> Cc: Christopherson,, Sean <seanjc@xxxxxxxxxx>; pbonzini@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; peterz@xxxxxxxxxxxxx; rppt@xxxxxxxxxx; binbin.wu@xxxxxxxxxxxxxxx; Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx>; john.allen@xxxxxxx; Neiger, Gil <gil.neiger@xxxxxxxxx> Subject: Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification On 6/30/2023 6:27 PM, Chao Gao wrote: > On Fri, Jun 30, 2023 at 05:34:28PM +0800, Yang, Weijiang wrote: >> On 6/17/2023 2:57 AM, Sean Christopherson wrote: >>> On Fri, Jun 16, 2023, Weijiang Yang wrote: >>>> On 6/16/2023 7:58 AM, Sean Christopherson wrote: >>>>> On Thu, Jun 08, 2023, Weijiang Yang wrote: >>>>>> On 6/6/2023 5:08 PM, Chao Gao wrote: >>>>>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote: >>>>>>>> Add handling for Control Protection (#CP) exceptions(vector 21). >>>>>>>> The new vector is introduced for Intel's Control-Flow >>>>>>>> Enforcement Technology (CET) relevant violation cases. >>>>>>>> >>>>>>>> Although #CP belongs contributory exception class, but the >>>>>>>> actual effect is conditional on CET being exposed to guest. If >>>>>>>> CET is not available to guest, #CP falls back to >>>>>>>> non-contributory and doesn't have an error code. >>>>>>> This sounds weird. is this the hardware behavior? If yes, could >>>>>>> you point us to where this behavior is documented? >>>>>> It's not SDM documented behavior. >>>>> The #CP behavior needs to be documented. Please pester whoever >>>>> you need to in order to make that happen. >>>> Do you mean documentation for #CP as an generic exception or the >>>> behavior in KVM as this patch shows? >>> As I pointed out two *years* ago, this entry in the SDM >>> >>> — The field's deliver-error-code bit (bit 11) is 1 if each of the following >>> holds: (1) the interruption type is hardware exception; (2) bit 0 >>> (corresponding to CR0.PE) is set in the CR0 field in the guest-state area; >>> (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector >>> indicates one of the following exceptions: #DF (vector 8), #TS (10), >>> #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17). >>> >>> needs to read something like >>> >>> — The field's deliver-error-code bit (bit 11) is 1 if each of the following >>> holds: (1) the interruption type is hardware exception; (2) bit 0 >>> (corresponding to CR0.PE) is set in the CR0 field in the guest-state area; >>> (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector >>> indicates one of the following exceptions: #DF (vector 8), #TS (10), >>> #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP >>> (21)[1] >>> >>> [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates >>> support for the 1-setting of CR4.CET. >> Hi, Sean, >> >> I sent above change request to Gil(added in cc), but he shared >> different opinion on this issue: >> >> >> "It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] as 1. >> >> However, there were earlier parts without CET that enumerated >> IA32_VMX_BASIC[56] as 0. >> >> On those parts, an attempt to inject an exception with vector 21 >> (#CP) with an error code would fail. >> >> (Injection of exception 21 with no error code would be allowed.) >> >> It may make things clearer if we document the statement above (all >> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1). >> >> I will see if we can update future revisions of the SDM to clarify this." >> >> >> Then if this is the case, kvm needs to check IA32_VMX_BASIC[56] >> before inject exception to nested VM. > And KVM can hide CET from guests if IA32_VMX_BASIC[56] is 0. Yes, this scratch patch didn't cover cross-check with CET enabling, thanks! > >> And this patch could be removed, instead need another patch like below: >> >> diff --git a/arch/x86/include/asm/msr-index.h >> b/arch/x86/include/asm/msr-index.h >> index ad35355ee43e..6b33aacc8587 100644 >> --- a/arch/x86/include/asm/msr-index.h >> +++ b/arch/x86/include/asm/msr-index.h >> @@ -1076,6 +1076,7 @@ >> #define VMX_BASIC_MEM_TYPE_MASK 0x003c000000000000LLU >> #define VMX_BASIC_MEM_TYPE_WB 6LLU >> #define VMX_BASIC_INOUT 0x0040000000000000LLU >> +#define VMX_BASIC_CHECK_ERRCODE 0x0140000000000000LLU >> >> /* Resctrl MSRs: */ >> /* - Intel: */ >> diff --git a/arch/x86/kvm/vmx/capabilities.h >> b/arch/x86/kvm/vmx/capabilities.h index 85cffeae7f10..4b1ed4dc03bc >> 100644 >> --- a/arch/x86/kvm/vmx/capabilities.h >> +++ b/arch/x86/kvm/vmx/capabilities.h >> @@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void) >> return (((u64)vmcs_config.basic_cap << 32) & >> VMX_BASIC_INOUT); >> } >> >> +static inline bool cpu_has_vmx_basic_check_errcode(void) >> +{ >> + return (((u64)vmcs_config.basic_cap << 32) & >> VMX_BASIC_CHECK_ERRCODE); >> +} >> + >> static inline bool cpu_has_virtual_nmis(void) >> { >> return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS >> && diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c >> index 78524daa2cb2..92aa4fc3d233 100644 >> --- a/arch/x86/kvm/vmx/nested.c >> +++ b/arch/x86/kvm/vmx/nested.c >> @@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct >> vcpu_vmx *vmx, >> u64 data) >> { >> const u64 feature_and_reserved = >> /* feature (except bit 48; see below) */ >> - BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | >> + BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) | >> /* reserved */ >> - BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56); >> + BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57); >> u64 vmx_basic = vmcs_config.nested.basic; >> >> if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved)) >> @@ -2873,7 +2873,8 @@ static int >> nested_check_vm_entry_controls(struct >> kvm_vcpu *vcpu, >> should_have_error_code = >> intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode && >> x86_exception_has_error_code(vector); >> - if (CC(has_error_code != should_have_error_code)) >> + if (!cpu_has_vmx_basic_check_errcode() && > We can skip computing should_have_error_code. and we should check if > IA32_VMX_BASIC[56] is set for this vCPU (i.e. in > vmx->nested.msrs.basic) rather than host/kvm capability. Oops, I confused myself, yes, need to reshape the code a bit and use msrs.basic to check the bit status, thanks! > >> + CC(has_error_code != should_have_error_code)) >> return -EINVAL; >> >> /* VM-entry exception error code */ @@ -6986,6 +6987,8 @@ >> static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs) >> >> if (cpu_has_vmx_basic_inout()) >> msrs->basic |= VMX_BASIC_INOUT; >> + if (cpu_has_vmx_basic_check_errcode()) >> + msrs->basic |= VMX_BASIC_CHECK_ERRCODE; >> } >> >> static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs) >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index >> d70f2e94b187..95c0eab7805c 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config >> *vmcs_conf, >> rdmsrl(MSR_IA32_VMX_MISC, misc_msr); >> >> vmcs_conf->size = vmx_msr_high & 0x1fff; >> - vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff; >> + vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff; >> >> vmcs_conf->revision_id = vmx_msr_low; >> >>