Re: [RFC PATCH v2 3/5] KVM: SVM: Enable Bus lock threshold exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/1/24 01:34, Manali Shukla wrote:
> From: Nikunj A Dadhania <nikunj@xxxxxxx>
> 
> Virtual machines can exploit bus locks to degrade the performance of
> the system. Bus locks can be caused by Non-WB(Write back) and
> misaligned locked RMW (Read-modify-Write) instructions and require
> systemwide synchronization among all processors which can result into
> significant performance penalties.
> 
> To address this issue, the Bus Lock Threshold feature is introduced to
> provide ability to hypervisor to restrict guests' capability of
> initiating mulitple buslocks, thereby preventing system slowdowns.
> 
> Support for the buslock threshold is indicated via CPUID function
> 0x8000000A_EDX[29].
> 
> On the processors that support the Bus Lock Threshold feature, the
> VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit
> Bus Lock threshold count.
> 
> VMCB intercept bit
> VMCB Offset	Bits	Function
> 14h	        5	Intercept bus lock operations
>                         (occurs after guest instruction finishes)
> 
> Bus lock threshold count
> VMCB Offset	Bits	Function
> 120h	        15:0	Bus lock counter
> 
> When a VMRUN instruction is executed, the bus lock threshold count is
> loaded into an internal count register. Before the processor executes
> a bus lock in the guest, it checks the value of this register:
> 
>  - If the value is greater than '0', the processor successfully
>    executes the bus lock and decrements the count.
> 
>  - If the value is '0', the bus lock is not executed, and a #VMEXIT to
>    the VMM is taken.
> 
> The bus lock threshold #VMEXIT is reported to the VMM with the VMEXIT
> code A5h, SVM_EXIT_BUS_LOCK.
> 
> When SVM_EXIT_BUS_LOCK occurs, the value of the current Bus Lock
> Threshold counter, which is '0', is loaded into the VMCB. This value
> must be reset to a default greater than '0' in bus_lock_exit handler
> in hypervisor, because the behavior of SVM_EXIT_BUS_LOCK is fault
> like, so the bus lock exit to userspace  happens with the RIP pointing
> to the offending instruction (which generates buslocks).
> 
> So, if the value of the Bus Lock Threshold counter remains '0', the
> guest continuously exits with SVM_EXIT_BUS_LOCK.
> 
> The generated SVM_EXIT_BUS_LOCK in kvm will exit to user space by
> setting the KVM_RUN_BUS_LOCK flag in vcpu->run->flags, indicating to
> the user space that a bus lock has been detected in the guest.
> 
> Use the already available KVM capability KVM_CAP_X86_BUS_LOCK_EXIT to
> enable the feature. This feature is disabled by default, it can be
> enabled explicitly from user space.
> 
> More details about the Bus Lock Threshold feature can be found in AMD
> APM [1].

You should mention in the commit message that this implementation is set
up to intercept every guest bus lock. The initial value of the threshold
is 0 in the VMCB, so the very first bus lock will exit, set the
threshold to 1 (so that the offending instruction can make forward
progress) but then the value is at 0 again so the next bus lock will exit.

> 
> [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
>      Vol 2, 15.14.5 Bus Lock Threshold.
>      https://bugzilla.kernel.org/attachment.cgi?id=306250
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx>
> Co-developed-by: Manali Shukla <manali.shukla@xxxxxxx>
> Signed-off-by: Manali Shukla <manali.shukla@xxxxxxx>
> ---
>  arch/x86/include/asm/svm.h      |  5 ++++-
>  arch/x86/include/uapi/asm/svm.h |  2 ++
>  arch/x86/kvm/svm/nested.c       | 12 ++++++++++++
>  arch/x86/kvm/svm/svm.c          | 29 +++++++++++++++++++++++++++++
>  4 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index f0dea3750ca9..bad9723f40e1 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -116,6 +116,7 @@ enum {
>  	INTERCEPT_INVPCID,
>  	INTERCEPT_MCOMMIT,
>  	INTERCEPT_TLBSYNC,
> +	INTERCEPT_BUSLOCK,
>  };
>  
>  
> @@ -158,7 +159,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  	u64 avic_physical_id;	/* Offset 0xf8 */
>  	u8 reserved_7[8];
>  	u64 vmsa_pa;		/* Used for an SEV-ES guest */
> -	u8 reserved_8[720];
> +	u8 reserved_8[16];
> +	u16 bus_lock_counter;	/* Offset 0x120 */
> +	u8 reserved_9[702];
>  	/*
>  	 * Offset 0x3e0, 32 bytes reserved
>  	 * for use by hypervisor/software.
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index 1814b413fd57..abf6aed88cee 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -95,6 +95,7 @@
>  #define SVM_EXIT_CR14_WRITE_TRAP		0x09e
>  #define SVM_EXIT_CR15_WRITE_TRAP		0x09f
>  #define SVM_EXIT_INVPCID       0x0a2
> +#define SVM_EXIT_BUS_LOCK			0x0a5
>  #define SVM_EXIT_NPF           0x400
>  #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
>  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
> @@ -224,6 +225,7 @@
>  	{ SVM_EXIT_CR4_WRITE_TRAP,	"write_cr4_trap" }, \
>  	{ SVM_EXIT_CR8_WRITE_TRAP,	"write_cr8_trap" }, \
>  	{ SVM_EXIT_INVPCID,     "invpcid" }, \
> +	{ SVM_EXIT_BUS_LOCK,     "buslock" }, \
>  	{ SVM_EXIT_NPF,         "npf" }, \
>  	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
>  	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 6f704c1037e5..670092d31f77 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -669,6 +669,12 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
>  	vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
>  
> +	/*
> +	 * The bus lock threshold count should keep running across nested
> +	 * transitions.
> +	 * Copied the buslock threshold count from vmcb01 to vmcb02.

No need to start this part of the comment on a separate line. Also,
s/Copied/Copy/ (ditto below).

> +	 */
> +	vmcb02->control.bus_lock_counter = vmcb01->control.bus_lock_counter;
>  	/* Done at vmrun: asid.  */
>  
>  	/* Also overwritten later if necessary.  */
> @@ -1035,6 +1041,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>  
>  	}
>  
> +	/*
> +	 * The bus lock threshold count should keep running across nested
> +	 * transitions.
> +	 * Copied the buslock threshold count from vmcb02 to vmcb01.
> +	 */
> +	vmcb01->control.bus_lock_counter = vmcb02->control.bus_lock_counter;
>  	nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr);
>  
>  	svm_switch_vmcb(svm, &svm->vmcb01);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5ab2c92c7331..41c773a40f99 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1372,6 +1372,11 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  		svm->vmcb->control.int_ctl |= V_GIF_ENABLE_MASK;
>  	}
>  
> +	if (vcpu->kvm->arch.bus_lock_detection_enabled)
> +		svm_set_intercept(svm, INTERCEPT_BUSLOCK);
> +	else
> +		svm_clr_intercept(svm, INTERCEPT_BUSLOCK);

Is the else path really needed?

Thanks,
Tom

> +
>  	if (sev_guest(vcpu->kvm))
>  		sev_init_vmcb(svm);
>  
> @@ -3283,6 +3288,24 @@ static int invpcid_interception(struct kvm_vcpu *vcpu)
>  	return kvm_handle_invpcid(vcpu, type, gva);
>  }
>  
> +static int bus_lock_exit(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK;
> +	vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK;
> +
> +	/*
> +	 * Reload the counter with value greater than '0'.
> +	 * The bus lock exit on SVM happens with RIP pointing to the guilty
> +	 * instruction. So, reloading the value of bus_lock_counter to '0'
> +	 * results into generating continuous bus lock exits.
> +	 */
> +	svm->vmcb->control.bus_lock_counter = 1;
> +
> +	return 0;
> +}
> +
>  static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[SVM_EXIT_READ_CR0]			= cr_interception,
>  	[SVM_EXIT_READ_CR3]			= cr_interception,
> @@ -3350,6 +3373,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
>  	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
>  	[SVM_EXIT_INVPCID]                      = invpcid_interception,
> +	[SVM_EXIT_BUS_LOCK]			= bus_lock_exit,
>  	[SVM_EXIT_NPF]				= npf_interception,
>  	[SVM_EXIT_RSM]                          = rsm_interception,
>  	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
> @@ -5212,6 +5236,11 @@ static __init void svm_set_cpu_caps(void)
>  		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
>  	}
>  
> +	if (cpu_feature_enabled(X86_VIRT_FEATURE_BUS_LOCK_THRESHOLD)) {
> +		pr_info("Bus Lock Threashold supported\n");
> +		kvm_caps.has_bus_lock_exit = true;
> +	}
> +
>  	/* CPUID 0x80000008 */
>  	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
>  	    boot_cpu_has(X86_FEATURE_AMD_SSBD))




[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux