[PATCH] KVM: SVM: Add Idle HLT intercept support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Add support for "Idle HLT" interception on AMD CPUs, and enable Idle HLT
interception instead of "normal" HLT interception for all VMs for which
HLT-exiting is enabled.  Idle HLT provides a mild performance boost for
all VM types, by avoiding a VM-Exit in the scenario where KVM would
immediately "wake" and resume the vCPU.

Idle HLT makes HLT-exiting conditional on the vCPU not having a valid,
unmasked interrupt.  Specifically, a VM-Exit occurs on execution of HLT
if and only if there are no pending V_IRQ or V_NMI events.  Note, Idle
is a replacement for full HLT interception, i.e. enabling HLT interception
would result in all HLT instructions causing unconditional VM-Exits.  Per
the APM:

 When both HLT and Idle HLT intercepts are active at the same time, the
 HLT intercept takes priority. This intercept occurs only if a virtual
 interrupt is not pending (V_INTR or V_NMI).

For KVM's use of V_IRQ (also called V_INTR in the APM) to detect interrupt
windows, the net effect of enabling Idle HLT is that, if a virtual
interupt is pending and unmasked at the time of HLT, the vCPU will take
a V_IRQ intercept instead of a HLT intercept.

When AVIC is enabled, Idle HLT works as intended: the vCPU continues
unimpeded and services the pending virtual interrupt.

Note, the APM's description of V_IRQ interaction with AVIC is quite
confusing, and requires piecing together implied behavior.  Per the APM,
when AVIC is enabled, V_IRQ *from the VMCB* is ignored:

  When AVIC mode is enabled for a virtual processor, the V_IRQ, V_INTR_PRIO,
  V_INTR_VECTOR, and V_IGN_TPR fields in the VMCB are ignored.

Which seems to contradict the behavior of Idle HLT:

  This intercept occurs only if a virtual interrupt is not pending (V_INTR
  or V_NMI).

What's not explicitly stated is that hardware's internal copy of V_IRQ
(and related fields) *are* still active, i.e. are presumably used to cache
information from the virtual APIC.

Handle Idle HLT exits as if they were normal HLT exits, e.g. don't try to
optimize the handling under the assumption that there isn't a pending IRQ.
Irrespective of AVIC, Idle HLT is inherently racy with respect to the vIRR,
as KVM can set vIRR bits asychronously.

No changes are required to support KVM's use Idle HLT while running
L2.  In fact, supporting Idle HLT is actually a bug fix to some extent.
If L1 wants to intercept HLT, recalc_intercepts() will enable HLT
interception in vmcb02 and forward the intercept to L1 as normal.

But if L1 does not want to intercept HLT, then KVM will run L2 with Idle
HLT enabled and HLT interception disabled.  If a V_IRQ or V_NMI for L2
becomes pending and L2 executes HLT, then use of Idle HLT will do the
right thing, i.e. not #VMEXIT and instead deliver the virtual event.  KVM
currently doesn't handle this scenario correctly, e.g. doesn't check V_IRQ
or V_NMI in vmcs02 as part of kvm_vcpu_has_events().

Do not expose Idle HLT to L1 at this time, as supporting nested Idle HLT is
more complex than just enumerating the feature, e.g. requires KVM to handle
the aforementioned scenarios of V_IRQ and V_NMI at the time of exit.

Signed-off-by: Manali Shukla <Manali.Shukla@xxxxxxx>
Reviewed-by: Nikunj A Dadhania <nikunj@xxxxxxx>
Link: https://bugzilla.kernel.org/attachment.cgi?id=306250
Link: https://lore.kernel.org/r/20250128124812.7324-3-manali.shukla@xxxxxxx
[sean: rewrite changelog, drop nested "support"]
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
 arch/x86/include/asm/svm.h      |  1 +
 arch/x86/include/uapi/asm/svm.h |  2 ++
 arch/x86/kvm/svm/svm.c          | 11 ++++++++---
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index e2fac21471f5..12a9dde1e842 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -116,6 +116,7 @@ enum {
 	INTERCEPT_INVPCID,
 	INTERCEPT_MCOMMIT,
 	INTERCEPT_TLBSYNC,
+	INTERCEPT_IDLE_HLT = 166,
 };
 
 
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 1814b413fd57..ec1321248dac 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -95,6 +95,7 @@
 #define SVM_EXIT_CR14_WRITE_TRAP		0x09e
 #define SVM_EXIT_CR15_WRITE_TRAP		0x09f
 #define SVM_EXIT_INVPCID       0x0a2
+#define SVM_EXIT_IDLE_HLT      0x0a6
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
@@ -224,6 +225,7 @@
 	{ SVM_EXIT_CR4_WRITE_TRAP,	"write_cr4_trap" }, \
 	{ SVM_EXIT_CR8_WRITE_TRAP,	"write_cr8_trap" }, \
 	{ SVM_EXIT_INVPCID,     "invpcid" }, \
+	{ SVM_EXIT_IDLE_HLT,     "idle-halt" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7640a84e554a..37e83bde8f9f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1297,8 +1297,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		svm_set_intercept(svm, INTERCEPT_MWAIT);
 	}
 
-	if (!kvm_hlt_in_guest(vcpu->kvm))
-		svm_set_intercept(svm, INTERCEPT_HLT);
+	if (!kvm_hlt_in_guest(vcpu->kvm)) {
+		if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT))
+			svm_set_intercept(svm, INTERCEPT_IDLE_HLT);
+		else
+			svm_set_intercept(svm, INTERCEPT_HLT);
+	}
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
@@ -3342,6 +3346,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_INVPCID]                      = invpcid_interception,
+	[SVM_EXIT_IDLE_HLT]			= kvm_emulate_halt,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
@@ -3504,7 +3509,7 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code)
 		return interrupt_window_interception(vcpu);
 	else if (exit_code == SVM_EXIT_INTR)
 		return intr_interception(vcpu);
-	else if (exit_code == SVM_EXIT_HLT)
+	else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT)
 		return kvm_emulate_halt(vcpu);
 	else if (exit_code == SVM_EXIT_NPF)
 		return npf_interception(vcpu);

base-commit: b9cd96a7ff9cc9ddf95de59d69afb174a9e90c6e
-- 





[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux