On Tue, Mar 19, 2019 at 12:26:55PM +0800, Xiaoyao Li wrote: > Hi, Sean, > > On Mon, 2019-03-18 at 09:23 -0700, Sean Christopherson wrote: > > On Mon, Mar 18, 2019 at 07:43:23PM +0800, Xiaoyao Li wrote: > > > cpuid faulting is a feature about CPUID instruction. When cpuif faulting > > > > ^^^^^ > > cpuid > > > is enabled, all execution of the CPUID instruction outside system-management > > > mode (SMM) cause a general-protection (#GP) if the CPL > 0. > > > > > > About this feature, detailed information can be found at > > > > https://www.intel.com/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf > > > > > > Current KVM provides software emulation of this feature for guest. > > > However, because cpuid faulting takes higher priority over CPUID vm exit > > > (Intel > > > SDM vol3.25.1.1), there is a risk of leaking cpuid faulting to guest when > > > host > > > enables it. If host enables cpuid faulting by setting the bit 0 of > > > MSR_MISC_FEATURES_ENABLES, it will pass to guest since there is no handling > > > of > > > MSR_MISC_FEATURES_ENABLES yet. As a result, when guest calls CPUID > > > instruction > > > in CPL > 0, it will generate a #GP instead of CPUID vm eixt. > > > > > > This issue will cause guest boot failure when guest uses *modprobe* > > > to load modules. *modprobe* calls CPUID instruction, thus causing #GP in > > > guest. Since there is no handling of cpuid faulting in #GP handler, guest > > > fails boot. > > > > > > To fix this issue, we should switch cpuid faulting bit between host and > > > guest. > > > Since MSR_MISC_FEATURES_ENABLES is intel-specific, this patch implement the > > > switching only in vmx. It clears the cpuid faulting bit and save host's > > > value before switching to guest, and restores the cpuid faulting settings of > > > host before switching to host. > > > > > > Because kvm provides the software emulation of cpuid faulting, we can > > > just clear the cpuid faulting bit in hardware MSR when switching to > > > guest. > > > > > > Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxxxxxxxx> > > > --- > > > Changes in v2: > > > - move the save/restore of cpuid faulting bit to > > > vmx_prepare_swich_to_guest/vmx_prepare_swich_to_host to avoid every > > > vmentry RDMSR, based on Paolo's comment. > > > > > > --- > > > arch/x86/kvm/vmx/vmx.c | 34 ++++++++++++++++++++++++++++++++++ > > > arch/x86/kvm/vmx/vmx.h | 2 ++ > > > 2 files changed, 36 insertions(+) > > > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > index 541d442edd4e..2c59e0209e36 100644 > > > --- a/arch/x86/kvm/vmx/vmx.c > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > @@ -1035,6 +1035,23 @@ static void pt_guest_exit(struct vcpu_vmx *vmx) > > > wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl); > > > } > > > > > > +static void vmx_save_host_cpuid_fault(struct vcpu_vmx *vmx) > > > > Maybe vmx_load_guest_misc_features_enables()? See below for reasoning. > > Alternatively you can drop the helpers altogether. > > > > > +{ > > > + u64 host_val; > > > + > > > + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT)) > > > + return; > > > + > > > + rdmsrl(MSR_MISC_FEATURES_ENABLES, host_val); > > > + vmx->host_msr_misc_features_enables = host_val; > > > > There's no need to cache the host value in struct vcpu_vmx, just use the > > kernel's shadow value. > > you're right, thanks for pointing out this. > > > > + > > > + /* clear cpuid fault bit to avoid it leak to guest */ > > > > Personally I find the comment unnecessary and somewhat misleading. > > > > > + if (host_val & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT) { > > > + wrmsrl(MSR_MISC_FEATURES_ENABLES, > > > + host_val & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT); > > > > I think we can also skip WRMSR if CPUID faulting is also enabled in the > > Right, the initial purpose of this patch is to clear hardware cpuid faulting bit > for guest and just let guest use the kvm emulation of cpuid faulting. > Using hardware cpuid faulting instead of kvm emulation is what the next patch > doing. > > It's Paolo's suggestion that splits it into 2 patches. Agreed, two patches is better. > > guest. It probably doesn't make sense to install the guest's value if > > CPUID faulting is enabled in the guest and not host since intercepting > > CPUID likely provides better overall performance than switching the MSR > > on entry and exit. And last but not least, since there are other bits > > Actually, this MSR switching is not happening on every entry and exit, it clears > host cpuid faulting only when vmx->loaded_cpu_state == NULL, and load host cpuid > faulting only when vmx->loaded_cpu_state != NULL. Usually, in the vcpu_run loop, > it switchs for once. Whoops, yeah, every save/restore cycle. > However, there indeed is a tradeoff between switching the MSR and intercepting > CPUID. Using hardware cpuid faulting for guest, it needs to switch related MSR, > which incurs wrmsr overhead. But using emulation, it has to suffer the overhead > of vmexit with intercepting CPUID instruction. > And I don't know which is better. > > At this point, I think it makes sense to use 2 patches. One for fixing potential > leaking issue that just clears cpuid faulting bit for guest, and the other > writes guest cpuid faulting value to hardware bit to optimize performance (maybe > it doesn't). > > > in the MSR that we don't want to expose to the guest, e.g. RING3MWAIT, > > checking for a non-zero host value is probably better than checking for > > individual feature bits. Same reasoning for writing the guest's value > > instead of clearing just the CPUID faulting bit. > > About RING3MWAIT, I just let it go as without this patch. > Since you pointed out *we* don't want to expose other bits in the MSR to the > guest, I will clear the both bits in next version. Clear all bits, i.e. write 0. That way KVM doesn't need to be updated if/when hardware introduces another bit. > > > > So something like: > > > > u64 msrval = this_cpu_read(msr_misc_features_shadow); > > > > if (!msrval || msrval == vcpu->arch.msr_misc_features_enables) > > return; > > > > wrmsrl(MSR_MISC_FEATURES_ENABLES, vcpu->arch.msr_misc_features_enables); > > > > > > or if you drop the helpers: > > > > msrval = this_cpu_read(msr_misc_features_shadow); > > if (msrval && msrval != vcpu->arch.msr_misc_features_enables) > > wrmsrl(MSR_MISC_FEATURES_ENABLES, > > vcpu->arch.msr_misc_features_enables); > > > > > + } > > > +} > > > + > > > void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) > > > { > > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > > @@ -1068,6 +1085,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu > > > *vcpu) > > > vmx->loaded_cpu_state = vmx->loaded_vmcs; > > > host_state = &vmx->loaded_cpu_state->host_state; > > > > > > + vmx_save_host_cpuid_fault(vmx); > > > + > > > /* > > > * Set host fs and gs selectors. Unfortunately, 22.2.3 does not > > > * allow segment selectors with cpl > 0 or ti == 1. > > > @@ -1124,6 +1143,19 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu > > > *vcpu) > > > } > > > } > > > > > > +static void vmx_restore_host_cpuid_fault(struct vcpu_vmx *vmx) > > > > If you keep the helpers, maybe vmx_load_host_misc_features_enables() > > to pair with the new function name for loading guest state? > > > > > +{ > > > + u64 msrval; > > > + > > > + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT)) > > > + return; > > > + > > > + rdmsrl(MSR_MISC_FEATURES_ENABLES, msrval); > > > + msrval |= vmx->host_msr_misc_features_enables & > > > + MSR_MISC_FEATURES_ENABLES_CPUID_FAULT; > > > > Again, there's no need for RDMSR, the host's value can be pulled from > > msr_misc_features_shadow, and the WRMSR can be skipped if the host and > > guest have the same value, i.e. we didn't install the guest's value. > > > > u64 msrval = this_cpu_read(msr_misc_features_shadow); > > > > if (!msrval || msrval == vcpu->arch.msr_misc_features_enables) > > return; > > > > wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval); > > > > > + wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval); > > > +} > > > + > > > static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx) > > > { > > > struct vmcs_host_state *host_state; > > > @@ -1137,6 +1169,8 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx > > > *vmx) > > > ++vmx->vcpu.stat.host_state_reload; > > > vmx->loaded_cpu_state = NULL; > > > > > > + vmx_restore_host_cpuid_fault(vmx); > > > + > > > #ifdef CONFIG_X86_64 > > > rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); > > > #endif > > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > > > index 5df73b36fa49..ba867bbc5676 100644 > > > --- a/arch/x86/kvm/vmx/vmx.h > > > +++ b/arch/x86/kvm/vmx/vmx.h > > > @@ -268,6 +268,8 @@ struct vcpu_vmx { > > > u64 msr_ia32_feature_control_valid_bits; > > > u64 ept_pointer; > > > > > > + u64 host_msr_misc_features_enables; > > > > As mentioned above, this isn't needed. > > Sure, I will remove this and use msr_misc_features_shadow. > > > > + > > > struct pt_desc pt_desc; > > > }; > > > > > > -- > > > 2.19.1 > > > >