On Mon, 2018-01-29 at 01:58 +0100, KarimAllah Ahmed wrote: > Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for > guests that will only mitigate Spectre V2 through IBRS+IBPB and will not > be using a retpoline+IBPB based approach. > > To avoid the overhead of atomically saving and restoring the > MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only > add_atomic_switch_msr when a non-zero is written to it. > > Cc: Asit Mallick <asit.k.mallick@xxxxxxxxx> > Cc: Arjan Van De Ven <arjan.van.de.ven@xxxxxxxxx> > Cc: Dave Hansen <dave.hansen@xxxxxxxxx> > Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: David Woodhouse <dwmw@xxxxxxxxxxxx> > Cc: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> > Cc: Andy Lutomirski <luto@xxxxxxxxxx> > Cc: Ashok Raj <ashok.raj@xxxxxxxxx> > Signed-off-by: KarimAllah Ahmed <karahmed@xxxxxxxxx> > > --- > v2: > - remove 'host_spec_ctrl' in favor of only a comment (dwmw@). > - special case writing '0' in SPEC_CTRL to avoid confusing live-migration > when the instance never used the MSR (dwmw@). Possibly wants a comment in the code explaining this in slightly more detail. The point being that if we migrate a guest which has never used the MSR, we don't want the act of setting it to zero on resume to flip it into the auto-saved mode. > - depend on X86_FEATURE_IBRS instead of X86_FEATURE_SPEC_CTRL (dwmw@). > - add MSR_IA32_SPEC_CTRL to the list of MSRs to save (dropped it by accident). > --- > arch/x86/kvm/cpuid.c | 4 +++- > arch/x86/kvm/vmx.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > arch/x86/kvm/x86.c | 1 + > 3 files changed, 69 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 0099e10..32c0c14 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void) > /* These are scattered features in cpufeatures.h. */ > #define KVM_CPUID_BIT_AVX512_4VNNIW 2 > #define KVM_CPUID_BIT_AVX512_4FMAPS 3 > +#define KVM_CPUID_BIT_IBRS 26 > #define KF(x) bit(KVM_CPUID_BIT_##x) > > int kvm_update_cpuid(struct kvm_vcpu *vcpu) > @@ -392,7 +393,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > /* cpuid 7.0.edx*/ > const u32 kvm_cpuid_7_0_edx_x86_features = > - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); > + KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS) | \ > + (boot_cpu_has(X86_FEATURE_IBRS) ? KF(IBRS) : 0); > /* all calls to cpuid_count() should be made on the same cpu */ > get_cpu(); I think we need to expose more feature bits than that. See https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/pti&id=2961298efe1ea1b6fc0d7ee8b76018fa6c0bcef2 There are three AMD bits for IBRS, IBPB & STIBP which are the user- visible ones in /proc/cpuinfo, and the ones we use within the kernel to indicate the hardware availability (there are separate feature bits for when we're *using* IBPB etc., but that's only because feature bits are the only thing that ALTERNATIVEs can work from). In addition to those bits, Intel has its own. The Intel SPEC_CTRL CPUID bit (which you're setting above) indicates *both* IBRS and IBPB capability. The kernel sets the corresponding AMD bits when it sees SPEC_CTRL. Likewise Intel has a different bit for STIBP. You could construct a set of CPUID bits for the guest based on what the host has. So all three of the AMD IBRS/IBPB/STIBP bits in 80000008/EBX should just be passed through, and you could set the Intel SPEC_CTRL bit (7/EDX bit 26 that you're looking at above) only when you have X86_FEATURE_IBPB && X86_FEATURE_IBRS. And the Intel STIBP when you have X86_FEATURE_STIBP. The Intel ARCH_CAPABILITIES CPUID bit is separate. Pass that through if you have it, and expose the corresponding MSR read-only. > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index aa8638a..dac564d 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -920,6 +920,8 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); > static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, > u16 error_code); > static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); > +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, > + u32 msr, int type); > > static DEFINE_PER_CPU(struct vmcs *, vmxarea); > static DEFINE_PER_CPU(struct vmcs *, current_vmcs); Perhaps move that whole function further up? > static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) > { > u64 guest_efer = vmx->vcpu.arch.efer; > @@ -3203,7 +3227,9 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, > */ > static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > { > + u64 spec_ctrl = 0; Could you ditch this additional variable and... > struct shared_msr_entry *msr; > + struct vcpu_vmx *vmx = to_vmx(vcpu); > > switch (msr_info->index) { > #ifdef CONFIG_X86_64 > @@ -3223,6 +3249,20 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_TSC: > msr_info->data = guest_read_tsc(vcpu); > break; > + case MSR_IA32_SPEC_CTRL: > + if (!msr_info->host_initiated && > + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) > + return 1; > + > + /* > + * If the MSR is not in the atomic list yet, then the guest > + * never wrote a non-zero value to it yet i.e. the MSR value is > + * '0'. > + */ ... if (read_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, &msr_info->data, NULL)) msr_info->data = 0;
Attachment:
smime.p7s
Description: S/MIME cryptographic signature