Hi Tom, Thank you for reviewing my patches. On 10/1/2024 7:13 PM, Tom Lendacky wrote: > On 10/1/24 01:34, Manali Shukla wrote: >> From: Nikunj A Dadhania <nikunj@xxxxxxx> >> >> Virtual machines can exploit bus locks to degrade the performance of >> the system. Bus locks can be caused by Non-WB(Write back) and >> misaligned locked RMW (Read-modify-Write) instructions and require >> systemwide synchronization among all processors which can result into >> significant performance penalties. >> >> To address this issue, the Bus Lock Threshold feature is introduced to >> provide ability to hypervisor to restrict guests' capability of >> initiating mulitple buslocks, thereby preventing system slowdowns. >> >> Support for the buslock threshold is indicated via CPUID function >> 0x8000000A_EDX[29]. >> >> On the processors that support the Bus Lock Threshold feature, the >> VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit >> Bus Lock threshold count. >> >> VMCB intercept bit >> VMCB Offset Bits Function >> 14h 5 Intercept bus lock operations >> (occurs after guest instruction finishes) >> >> Bus lock threshold count >> VMCB Offset Bits Function >> 120h 15:0 Bus lock counter >> >> When a VMRUN instruction is executed, the bus lock threshold count is >> loaded into an internal count register. Before the processor executes >> a bus lock in the guest, it checks the value of this register: >> >> - If the value is greater than '0', the processor successfully >> executes the bus lock and decrements the count. >> >> - If the value is '0', the bus lock is not executed, and a #VMEXIT to >> the VMM is taken. >> >> The bus lock threshold #VMEXIT is reported to the VMM with the VMEXIT >> code A5h, SVM_EXIT_BUS_LOCK. >> >> When SVM_EXIT_BUS_LOCK occurs, the value of the current Bus Lock >> Threshold counter, which is '0', is loaded into the VMCB. This value >> must be reset to a default greater than '0' in bus_lock_exit handler >> in hypervisor, because the behavior of SVM_EXIT_BUS_LOCK is fault >> like, so the bus lock exit to userspace happens with the RIP pointing >> to the offending instruction (which generates buslocks). >> >> So, if the value of the Bus Lock Threshold counter remains '0', the >> guest continuously exits with SVM_EXIT_BUS_LOCK. >> >> The generated SVM_EXIT_BUS_LOCK in kvm will exit to user space by >> setting the KVM_RUN_BUS_LOCK flag in vcpu->run->flags, indicating to >> the user space that a bus lock has been detected in the guest. >> >> Use the already available KVM capability KVM_CAP_X86_BUS_LOCK_EXIT to >> enable the feature. This feature is disabled by default, it can be >> enabled explicitly from user space. >> >> More details about the Bus Lock Threshold feature can be found in AMD >> APM [1]. > > You should mention in the commit message that this implementation is set > up to intercept every guest bus lock. The initial value of the threshold > is 0 in the VMCB, so the very first bus lock will exit, set the > threshold to 1 (so that the offending instruction can make forward > progress) but then the value is at 0 again so the next bus lock will exit. Sure. I will add to V3. > >> >> [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, >> Vol 2, 15.14.5 Bus Lock Threshold. >> https://bugzilla.kernel.org/attachment.cgi?id=306250 >> >> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx> >> Co-developed-by: Manali Shukla <manali.shukla@xxxxxxx> >> Signed-off-by: Manali Shukla <manali.shukla@xxxxxxx> >> --- >> arch/x86/include/asm/svm.h | 5 ++++- >> arch/x86/include/uapi/asm/svm.h | 2 ++ >> arch/x86/kvm/svm/nested.c | 12 ++++++++++++ >> arch/x86/kvm/svm/svm.c | 29 +++++++++++++++++++++++++++++ >> 4 files changed, 47 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h >> index f0dea3750ca9..bad9723f40e1 100644 >> --- a/arch/x86/include/asm/svm.h >> +++ b/arch/x86/include/asm/svm.h >> @@ -116,6 +116,7 @@ enum { >> INTERCEPT_INVPCID, >> INTERCEPT_MCOMMIT, >> INTERCEPT_TLBSYNC, >> + INTERCEPT_BUSLOCK, >> }; >> >> >> @@ -158,7 +159,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area { >> u64 avic_physical_id; /* Offset 0xf8 */ >> u8 reserved_7[8]; >> u64 vmsa_pa; /* Used for an SEV-ES guest */ >> - u8 reserved_8[720]; >> + u8 reserved_8[16]; >> + u16 bus_lock_counter; /* Offset 0x120 */ >> + u8 reserved_9[702]; >> /* >> * Offset 0x3e0, 32 bytes reserved >> * for use by hypervisor/software. >> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h >> index 1814b413fd57..abf6aed88cee 100644 >> --- a/arch/x86/include/uapi/asm/svm.h >> +++ b/arch/x86/include/uapi/asm/svm.h >> @@ -95,6 +95,7 @@ >> #define SVM_EXIT_CR14_WRITE_TRAP 0x09e >> #define SVM_EXIT_CR15_WRITE_TRAP 0x09f >> #define SVM_EXIT_INVPCID 0x0a2 >> +#define SVM_EXIT_BUS_LOCK 0x0a5 >> #define SVM_EXIT_NPF 0x400 >> #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 >> #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 >> @@ -224,6 +225,7 @@ >> { SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \ >> { SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \ >> { SVM_EXIT_INVPCID, "invpcid" }, \ >> + { SVM_EXIT_BUS_LOCK, "buslock" }, \ >> { SVM_EXIT_NPF, "npf" }, \ >> { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ >> { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ >> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c >> index 6f704c1037e5..670092d31f77 100644 >> --- a/arch/x86/kvm/svm/nested.c >> +++ b/arch/x86/kvm/svm/nested.c >> @@ -669,6 +669,12 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, >> vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa; >> vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa; >> >> + /* >> + * The bus lock threshold count should keep running across nested >> + * transitions. >> + * Copied the buslock threshold count from vmcb01 to vmcb02. > > No need to start this part of the comment on a separate line. Also, > s/Copied/Copy/ (ditto below). > Ack. >> + */ >> + vmcb02->control.bus_lock_counter = vmcb01->control.bus_lock_counter; >> /* Done at vmrun: asid. */ >> >> /* Also overwritten later if necessary. */ >> @@ -1035,6 +1041,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm) >> >> } >> >> + /* >> + * The bus lock threshold count should keep running across nested >> + * transitions. >> + * Copied the buslock threshold count from vmcb02 to vmcb01. >> + */ >> + vmcb01->control.bus_lock_counter = vmcb02->control.bus_lock_counter; >> nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr); >> >> svm_switch_vmcb(svm, &svm->vmcb01); >> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c >> index 5ab2c92c7331..41c773a40f99 100644 >> --- a/arch/x86/kvm/svm/svm.c >> +++ b/arch/x86/kvm/svm/svm.c >> @@ -1372,6 +1372,11 @@ static void init_vmcb(struct kvm_vcpu *vcpu) >> svm->vmcb->control.int_ctl |= V_GIF_ENABLE_MASK; >> } >> >> + if (vcpu->kvm->arch.bus_lock_detection_enabled) >> + svm_set_intercept(svm, INTERCEPT_BUSLOCK); >> + else >> + svm_clr_intercept(svm, INTERCEPT_BUSLOCK); > > Is the else path really needed? > > Thanks, > Tom Correct. It is not needed. I will remove from V3. > >> + >> if (sev_guest(vcpu->kvm)) >> sev_init_vmcb(svm); >> >> @@ -3283,6 +3288,24 @@ static int invpcid_interception(struct kvm_vcpu *vcpu) >> return kvm_handle_invpcid(vcpu, type, gva); >> } >> >> +static int bus_lock_exit(struct kvm_vcpu *vcpu) >> +{ >> + struct vcpu_svm *svm = to_svm(vcpu); >> + >> + vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK; >> + vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK; >> + >> + /* >> + * Reload the counter with value greater than '0'. >> + * The bus lock exit on SVM happens with RIP pointing to the guilty >> + * instruction. So, reloading the value of bus_lock_counter to '0' >> + * results into generating continuous bus lock exits. >> + */ >> + svm->vmcb->control.bus_lock_counter = 1; >> + >> + return 0; >> +} >> + >> static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { >> [SVM_EXIT_READ_CR0] = cr_interception, >> [SVM_EXIT_READ_CR3] = cr_interception, >> @@ -3350,6 +3373,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { >> [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap, >> [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap, >> [SVM_EXIT_INVPCID] = invpcid_interception, >> + [SVM_EXIT_BUS_LOCK] = bus_lock_exit, >> [SVM_EXIT_NPF] = npf_interception, >> [SVM_EXIT_RSM] = rsm_interception, >> [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, >> @@ -5212,6 +5236,11 @@ static __init void svm_set_cpu_caps(void) >> kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); >> } >> >> + if (cpu_feature_enabled(X86_VIRT_FEATURE_BUS_LOCK_THRESHOLD)) { >> + pr_info("Bus Lock Threashold supported\n"); >> + kvm_caps.has_bus_lock_exit = true; >> + } >> + >> /* CPUID 0x80000008 */ >> if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || >> boot_cpu_has(X86_FEATURE_AMD_SSBD)) -Manali