[PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
attack surface is already covered by switch_mm_irqs_off() ->
cond_mitigation().

The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
wrong in its guest-to-guest design intention. There are three scenarios
at play here:

1. If the vCPUs belong to the same VM, they are in the same security
domain and do not need an IPBP.
2. If the vCPUs belong to different VMs, and each VM is in its own
mm_struct, switch_mm_irqs_off() will handle IBPB as an mm switch is
guaranteed to occur prior to loading a vCPU belonging to a different VMs.
3. If the vCPUs belong to different VMs, but multiple VMs share an
mm_struct, then the security benefits of an IBPB when switching vCPUs
are dubious, at best.

Issuing IBPB from KVM vCPU load would only cover #3, but there are no
known real world, tangible use cases for running multiple VMs belonging
to different security domains in a shared address space. Running multiple
VMs in a single address space is plausible and sane, _if_ they are all
in the same security domain or security is not a concern. If multiple VMs
are sharing an mm_structs, prediction attacks are the least of their
security worries.

Update documentation to reflect what we're protecting against and what
we're not and also remove "buddy" from vmx_vcpu_load_vmcs() as it is
now unused.

Fixes: 15d45071523d ("KVM/x86: Add IBPB support")
Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx>
Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
---
v1
 - https://lore.kernel.org/all/20220411164636.74866-1-jon@xxxxxxxxxxx/
v1 -> v2:
 - https://lore.kernel.org/all/20220419020011.65995-1-jon@xxxxxxxxxxx/
 - Addressed comments on approach from Sean.
v2 -> v3:
 - https://lore.kernel.org/all/20220422162103.32736-1-jon@xxxxxxxxxxx/
 - Updated spec-ctrl.h comments and commit msg to incorporate
   additional feedback from Sean.
v3 -> v4:
 - Addressed comments from Boris and Sean.
 - Narrowed change to removing KVM's IBPB only + docs update.
 - Removed "buddy" on vmx_vcpu_load_vmcs() as it is now unused.

 Documentation/admin-guide/hw-vuln/spectre.rst | 33 +++++++++++++++++--
 arch/x86/kvm/svm/svm.c                        |  1 -
 arch/x86/kvm/vmx/nested.c                     |  6 ++--
 arch/x86/kvm/vmx/vmx.c                        | 13 ++------
 arch/x86/kvm/vmx/vmx.h                        |  3 +-
 5 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 9e9556826450..1705f7a0b15d 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -314,7 +314,23 @@ Spectre variant 2

    Linux kernel mitigates attacks to other guests running in the same
    CPU hardware thread by flushing the return stack buffer on VM exit,
-   and clearing the branch target buffer before switching to a new guest.
+   and clearing the branch target buffer before switching to a new guest
+   using IBPB.
+
+   When using IBPB to clear branch target buffer, there are three
+   scenarios at play for VM switching, as follows:
+   If switching between vCPUs belonging to the same guest VM on the same CPU
+   hardware thread, they are considered to be in the same security domain and
+   do not need an IPBP.
+   If switching between vCPUs that belong to different VMs, and each VM has
+   its own memory address space, then IBPB will clear during the context
+   switch.
+   If the Virtual Machine Monitor (VMM) configures multiple VMs to share
+   a single address space, they are all considered to be one single security
+   domain, such that switching in between vCPUs that belong to different VMs
+   in this address space will not use IBPB. Sharing an address space between
+   multiple VMs is uncommon and should be discouraged for security minded
+   workloads.

    If SMT is used, Spectre variant 2 attacks from an untrusted guest
    in the sibling hyperthread can be mitigated by the administrator,
@@ -534,7 +550,9 @@ Spectre variant 2

    To mitigate guest-to-guest attacks in the same CPU hardware thread,
    the branch target buffer is sanitized by flushing before switching
-   to a new guest on a CPU.
+   to a new guest on a CPU. Note that this will not occur if guests are
+   configured to use a single shared memory address space, as that is
+   considered a single security domain.

    The above mitigations are turned on by default on vulnerable CPUs.

@@ -660,6 +678,17 @@ Mitigation selection guide
    against variant 2 attacks originating from programs running on
    sibling threads.

+   Note that in a virtualized environment using KVM, this flush will be
+   done when switching in between VMs. Each VM is considered its own
+   security domain and IBPB will not flush when switching in between
+   vCPUs from a single guest running on the same pCPU. Virtual machine
+   monitors (VMMs), such as qemu-kvm, traditionally configure each VM to
+   be a separate process with separate memory address space; however,
+   it should be explicitly noted that IBPB will not flush between vCPU
+   changes if a VMM configures multiple VMs in a shared address space.
+   While such a configuration is plausible, it is not practical from a
+   security perspective and should be avoided for security minded workloads.
+
    Alternatively, STIBP can be used only when running programs
    whose indirect branch speculation is explicitly disabled,
    while IBPB is still used all the time when switching to a new
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7e45d03cd018..051955145c29 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1302,7 +1302,6 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)

 	if (sd->current_vmcb != svm->vmcb) {
 		sd->current_vmcb = svm->vmcb;
-		indirect_branch_prediction_barrier();
 	}
 	if (kvm_vcpu_apicv_active(vcpu))
 		__avic_vcpu_load(vcpu, cpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 856c87563883..e2271d935d5c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -266,7 +266,7 @@ static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
 	cpu = get_cpu();
 	prev = vmx->loaded_vmcs;
 	vmx->loaded_vmcs = vmcs;
-	vmx_vcpu_load_vmcs(vcpu, cpu, prev);
+	vmx_vcpu_load_vmcs(vcpu, cpu);
 	vmx_sync_vmcs_host_state(vmx, prev);
 	put_cpu();

@@ -4097,12 +4097,12 @@ static void copy_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu,

 	cpu = get_cpu();
 	vmx->loaded_vmcs = &vmx->nested.vmcs02;
-	vmx_vcpu_load_vmcs(vcpu, cpu, &vmx->vmcs01);
+	vmx_vcpu_load_vmcs(vcpu, cpu);

 	sync_vmcs02_to_vmcs12_rare(vcpu, vmcs12);

 	vmx->loaded_vmcs = &vmx->vmcs01;
-	vmx_vcpu_load_vmcs(vcpu, cpu, &vmx->nested.vmcs02);
+	vmx_vcpu_load_vmcs(vcpu, cpu);
 	put_cpu();
 }

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 610355b9ccce..13f720686ad1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1235,8 +1235,7 @@ static void vmx_write_guest_kernel_gs_base(struct vcpu_vmx *vmx, u64 data)
 }
 #endif

-void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
-			struct loaded_vmcs *buddy)
+void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	bool already_loaded = vmx->loaded_vmcs->cpu == cpu;
@@ -1263,14 +1262,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
 	if (prev != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
 		vmcs_load(vmx->loaded_vmcs->vmcs);
-
-		/*
-		 * No indirect branch prediction barrier needed when switching
-		 * the active VMCS within a guest, e.g. on nested VM-Enter.
-		 * The L1 VMM can protect itself with retpolines, IBPB or IBRS.
-		 */
-		if (!buddy || WARN_ON_ONCE(buddy->vmcs != prev))
-			indirect_branch_prediction_barrier();
 	}

 	if (!already_loaded) {
@@ -1308,7 +1299,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);

-	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
+	vmx_vcpu_load_vmcs(vcpu, cpu);

 	vmx_vcpu_pi_load(vcpu, cpu);

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b98c7e96697a..d5f6d8d0b7ca 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -369,8 +369,7 @@ struct kvm_vmx {
 };

 bool nested_vmx_allowed(struct kvm_vcpu *vcpu);
-void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
-			struct loaded_vmcs *buddy);
+void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu);
 int allocate_vpid(void);
 void free_vpid(int vpid);
 void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
--
2.30.1 (Apple Git-130)




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux