From: Manali Shukla <Manali.Shukla@xxxxxxx> The hypervisor can intercept the HLT instruction by setting the HLT-Intercept Bit in VMCB, causing a VMEXIT. This can be wasteful if there are pending V_INTR and V_NMI events, as the hypervisor must then initiate a VMRUN to handle them. If the HLT-Intercept Bit is cleared and the vCPU executes HLT while there are pending V_INTR and V_NMI events, the hypervisor won’t detect them, potentially causing indefinite suspension of the vCPU. This poses a problem for enlightened guests who wish to securely handle the events. For Secure AVIC scenarios, if a guest does a HLT while an interrupt is pending (in IRR), the hypervisor does not have a way to figure out whether the guest needs to be re-entered, as it cannot read the guest backing page. The Idle HLT intercept feature allows the hypervisor to intercept HLT execution only if there are no pending V_INTR and V_NMI events. There are two use cases for the Idle HLT intercept feature: - Secure VMs that wish to handle pending events securely without exiting to the hypervisor on HLT (Secure AVIC). - Optimization for all the VMs to avoid a wasteful VMEXIT during HLT when there are pending events. On discovering the Idle HLT Intercept, the KVM hypervisor, Sets the Idle HLT Intercept bit (bit (6), offset 0x14h) in the VMCB. When the Idle HLT Intercept bit is set, HLT Intercept bit (bit (0), offset 0xFh) should be cleared. Before entering the HLT state, the HLT instruction performs checks in following order: - The HLT intercept check, if set, it unconditionally triggers SVM_EXIT_HLT (0x78). - The Idle HLT intercept check, if set and there are no pending V_INTR or V_NMI events, triggers SVM_EXIT_IDLE_HLT (0xA6). Details about the Idle HLT intercept feature can be found in AMD APM [1]. [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306250 Signed-off-by: Manali Shukla <Manali.Shukla@xxxxxxx> --- arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 ++ arch/x86/kvm/svm/svm.c | 13 ++++++++++--- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 2b59b9951c90..992050cb83d0 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -116,6 +116,7 @@ enum { INTERCEPT_INVPCID, INTERCEPT_MCOMMIT, INTERCEPT_TLBSYNC, + INTERCEPT_IDLE_HLT = 166, }; diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h index 1814b413fd57..ec1321248dac 100644 --- a/arch/x86/include/uapi/asm/svm.h +++ b/arch/x86/include/uapi/asm/svm.h @@ -95,6 +95,7 @@ #define SVM_EXIT_CR14_WRITE_TRAP 0x09e #define SVM_EXIT_CR15_WRITE_TRAP 0x09f #define SVM_EXIT_INVPCID 0x0a2 +#define SVM_EXIT_IDLE_HLT 0x0a6 #define SVM_EXIT_NPF 0x400 #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 @@ -224,6 +225,7 @@ { SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \ { SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \ { SVM_EXIT_INVPCID, "invpcid" }, \ + { SVM_EXIT_IDLE_HLT, "idle-halt" }, \ { SVM_EXIT_NPF, "npf" }, \ { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 78daedf6697b..36f307e71d5d 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1296,8 +1296,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu) svm_set_intercept(svm, INTERCEPT_MWAIT); } - if (!kvm_hlt_in_guest(vcpu->kvm)) - svm_set_intercept(svm, INTERCEPT_HLT); + if (!kvm_hlt_in_guest(vcpu->kvm)) { + if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT)) + svm_set_intercept(svm, INTERCEPT_IDLE_HLT); + else + svm_set_intercept(svm, INTERCEPT_HLT); + } control->iopm_base_pa = iopm_base; control->msrpm_base_pa = __sme_set(__pa(svm->msrpm)); @@ -3341,6 +3345,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap, [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap, [SVM_EXIT_INVPCID] = invpcid_interception, + [SVM_EXIT_IDLE_HLT] = kvm_emulate_halt, [SVM_EXIT_NPF] = npf_interception, [SVM_EXIT_RSM] = rsm_interception, [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, @@ -3503,7 +3508,7 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code) return interrupt_window_interception(vcpu); else if (exit_code == SVM_EXIT_INTR) return intr_interception(vcpu); - else if (exit_code == SVM_EXIT_HLT) + else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT) return kvm_emulate_halt(vcpu); else if (exit_code == SVM_EXIT_NPF) return npf_interception(vcpu); @@ -5224,6 +5229,8 @@ static __init void svm_set_cpu_caps(void) if (vnmi) kvm_cpu_cap_set(X86_FEATURE_VNMI); + kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT); + /* Nested VM can receive #VMEXIT instead of triggering #GP */ kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); } -- 2.34.1