Am 24/10/2022 um 18:52 schrieb Sean Christopherson: > "nVMX" instead of "vmx/nested" > > On Mon, Oct 24, 2022, Emanuele Giuseppe Esposito wrote: >> Currently vmx enables SECONDARY_EXEC_ENCLS_EXITING even when sgx >> is not set in the host MSR. >> This was probably introduced when sgx was not yet fully supported, and >> we wanted to trap guests trying to use the feature. > > Nah, it's just a boneheaded bug. > >> When booting a guest, KVM checks that the cpuid bit is actually set >> in vmx.c, and if not, it does not enable the feature. > > The CPUID thing is a red herring. That's an _additional_ restriction, KVM honors > MSR_IA32_VMX_PROCBASED_CTLS2 when configuring vmcs01. See adjust_vmx_controls() > for secondary controls in setup_vmcs_config(). > >> However, in nesting this control bit is blindly copied, and will be > > It's not "copied", KVM sets the bit in the nVMX MSR irrespective of host support, > which is the problem. > >> propagated to VMCS12 and VMCS02. Therefore, when L1 tries to boot >> the guest, the host will try to execute VMLOAD with VMCS02 containing >> a feature that the hardware does not support, making it fail with >> hardware error 0x7. >> >> According with section A.3.3 of Intel System Programming Guide, >> we should *always* check the value in the actual > > s/we/software > >> MSR_IA32_VMX_PROCBASED_CTLS2 before enabling this bit. >> >> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2127128 >> > > Fixes: 72add915fbd5 ("KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC") > Cc: stable@xxxxxxxxxxxxxxx > >> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@xxxxxxxxxx> >> --- >> arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++++++++-- >> 1 file changed, 20 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c >> index 8f67a9c4a287..f651084010cc 100644 >> --- a/arch/x86/kvm/vmx/nested.c >> +++ b/arch/x86/kvm/vmx/nested.c >> @@ -6678,6 +6678,25 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void) >> return (u64)max_idx << VMCS_FIELD_INDEX_SHIFT; >> } >> >> +/* >> + * According with section A.3.3 of > > Avoid referencing sections/tables by "number" in comments (and changelogs), as the > comment is all but guaranteed to become stale because future versions of the SDM > will shift things around. > > The slightly more robust way to reference a specific SDM/APM section is to use the > title, e.g. According to section "Secondary Processor-Based VM-Execution Controls" > in Intel's SDM ...", as hardware vendors are less likely to arbitrarily rename > sections and tables. It's a bit more work for readers, but any decent PDF viewer > can search these days. > >> Intel System Programming Guide > > KVM typically uses "Intel's SDM" (and "AMD's APM"). Like "VMX" or "SVM", it's ok > to use the SDM acronym without introducing since "SDM" is > >> + * we *can* set the guest MSR control X (in our case > > Avoid pronouns in comments. "we" and "us" are ambiguous, e.g. "we" can mean KVM, > the developer, the user, etc... > >> + * SECONDARY_EXEC_ENCLS_EXITING) *iff* bit 32+X of >> + * MSR_IA32_VMX_PROCBASED_CTLS2 is set to 1. >> + * Otherwise it must remain zero. > > As a general rule, if you find yourself writing a comment and a helper for > something that KVM absolutely needs to get right (honoring VMX MSRs), then odds > are very good that there's a simpler/easier fix, i.e. that you're effectively > re-inventing part of the weel. > >> + */ >> +static void nested_vmx_setup_encls_exiting(struct nested_vmx_msrs *msrs) >> +{ >> + u32 vmx_msr_procb_low, vmx_msr_procb_high; >> + >> + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, vmx_msr_procb_low, vmx_msr_procb_high); >> + >> + WARN_ON(vmx_msr_procb_low & SECONDARY_EXEC_ENCLS_EXITING); >> + >> + if (enable_sgx && (vmx_msr_procb_high & SECONDARY_EXEC_ENCLS_EXITING)) >> + msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING; >> +} >> + >> /* >> * nested_vmx_setup_ctls_msrs() sets up variables containing the values to be >> * returned for the various VMX controls MSRs when nested VMX is enabled. >> @@ -6874,8 +6893,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps) >> msrs->secondary_ctls_high |= >> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; >> >> - if (enable_sgx) >> - msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING; > > The issue here is that I, who wrote this code long, long ago, copied the pattern > used for enable_unrestricted_guest, flexpriority_enabled, etc... without fully > appreciating the logic. Unlike those module params, enable_sgx doesn't track > hardware support, i.e. enable_sgx isn't cleared when SGX can't be enabled due to > lack of hardware support. As a result, KVM effectively enumerates to L1 that the > control is always available, i.e. that KVM emulates ENCLS-exiting for L1, but KVM > obviously doesn't actually emulating the behavior. > > Not updating enable_sgx is responsible for a second bug: vmx_set_cpu_caps() doesn't > clear the SGX bits when hardware support is unavailable. This is a much less > problematic bug as as it only pops up if SGX is soft-disabled (the case being > handled by cpu_has_sgx()) or if SGX is supported for bare metal but not in the > VMCS (will never happen when running on bare metal, but can theoertically happen > when running in a VM). > > Last but not least, KVM should ideally have module params reflect KVM's actual > configuration. > > Killing all birds with one stone, simply clear enable_sgx when ENCLS-exiting isn't > supported. The #ifdef is a little gross, but I think it's marginally less ugly > than having vmx.c define a dummy boolean. > > Compile tested only... > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 9dba04b6b019..65f092e4a81b 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -8263,6 +8263,11 @@ static __init int hardware_setup(void) > if (!cpu_has_virtual_nmis()) > enable_vnmi = 0; > > +#ifdef CONFIG_X86_SGX_KVM > + if (!cpu_has_vmx_encls_vmexit()) > + enable_sgx = false; > +#endif > + > /* > * set_apic_access_page_addr() is used to reload apic access > * page upon invalidation. No need to do anything if not > Thanks for all the suggestions and explanations, I am going to apply your changes and send v2! Emanuele