Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Wed, 13 Nov 2024 18:59:53 +0100

On 11/13/24 17:24, Doug Covelli wrote:
No worries, you're not hijacking :) The only reason is that it would
be more code for a seldom used feature and anyway with worse performance.
(To be clear, CR8 based accesses are allowed, but stores cause an exit
in order to check the new TPR against IRR. That's because KVM's API
does not have an equivalent of the TPR threshold as you point out below).

I have not really looked at the code but it seems like it could also
simplify things as CR8 would be handled more uniformly regardless of
who is virtualizing the local APIC.

Not much because CR8 basically does not exist at all (it's just a byte
in memory) with userspace APIC.  So it's not easy to make it simpler, even
though it's less uniform.

That said, there is an optimization: you only get KVM_EXIT_SET_TPR if
CR8 decreases.

Also I could not find these documented anywhere but with MSFT's APIC our monitor
relies on extensions for trapping certain events such as INIT/SIPI plus LINT0
and SVR writes:

UINT64 X64ApicInitSipiExitTrap    : 1; // WHvRunVpExitReasonX64ApicInitSipiTrap
UINT64 X64ApicWriteLint0ExitTrap  : 1; // WHvRunVpExitReasonX64ApicWriteTrap
UINT64 X64ApicWriteLint1ExitTrap  : 1; // WHvRunVpExitReasonX64ApicWriteTrap
UINT64 X64ApicWriteSvrExitTrap    : 1; // WHvRunVpExitReasonX64ApicWriteTrap

There's no need for this in KVM's in-kernel APIC model. INIT and
SIPI are handled in the hypervisor and you can get the current
state of APs via KVM_GET_MPSTATE. LINT0 and LINT1 are injected
with KVM_INTERRUPT and KVM_NMI respectively, and they obey IF/PPR
and NMI blocking respectively, plus the interrupt shadow; so
there's no need for userspace to know when LINT0/LINT1 themselves
change. The spurious interrupt vector register is also handled
completely in kernel.

I realize that KVM can handle LINT0/SVR updates themselves but our
interrupt subsystem relies on knowing the current values of these
registers even when not virtualizing the local APIC.  I suppose we
could use KVM_GET_LAPIC to sync things up on demand but that seems
like it might nor be great from a performance point of view.

Ah no, you're right---you want to track the CPU that has ExtINT enabled
and send KVM_INTERRUPT to that one, I guess?  And you need the spurious
vector registers because writes can set the mask bit in LINTx, but
essentially you want to trap LINT0 changes.

Something like this (missing the KVM_ENABLE_CAP and KVM_CHECK_EXTENSION
code) is good, feel free to include it in your v2 (Co-developed-by
and Signed-off-by me):

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5fb29ca3263b..b7dd89c99613 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -122,6 +122,7 @@
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(34)
+#define KVM_REQ_REPORT_LINT0_ACCESS	KVM_ARCH_REQ(35)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -775,6 +776,7 @@ struct kvm_vcpu_arch {
 	u64 smi_count;
 	bool at_instruction_boundary;
 	bool tpr_access_reporting;
+	bool lint0_access_reporting;
 	bool xfd_no_write_intercept;
 	u64 ia32_xss;
 	u64 microcode_version;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 88dc43660d23..0e070f447aa2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1561,6 +1561,21 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
 			      apic->divide_count));
 }
 
+static void __report_lint0_access(struct kvm_lapic *apic, u32 value)
+{
+	struct kvm_vcpu *vcpu = apic->vcpu;
+	struct kvm_run *run = vcpu->run;
+
+	kvm_make_request(KVM_REQ_REPORT_LINT0_ACCESS, vcpu);
+	run->lint0_access.value = value;
+}
+
+static inline void report_lint0_access(struct kvm_lapic *apic, u32 value)
+{
+	if (apic->vcpu->arch.lint0_access_reporting)
+		__report_lint0_access(apic, value);
+}
+
 static void __report_tpr_access(struct kvm_lapic *apic, bool write)
 {
 	struct kvm_vcpu *vcpu = apic->vcpu;
@@ -2312,8 +2327,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			int i;
 
 			for (i = 0; i < apic->nr_lvt_entries; i++) {
-				kvm_lapic_set_reg(apic, APIC_LVTx(i),
-					kvm_lapic_get_reg(apic, APIC_LVTx(i)) | APIC_LVT_MASKED);
+				u32 old = kvm_lapic_get_reg(apic, APIC_LVTx(i));
+				kvm_lapic_set_reg(apic, APIC_LVTx(i), old | APIC_LVT_MASKED);
+				if (i == 0 && !(old & APIC_LVT_MASKED))
+					report_lint0_access(apic, old | APIC_LVT_MASKED);
 			}
 			apic_update_lvtt(apic);
 			atomic_set(&apic->lapic_timer.pending, 0);
@@ -2352,6 +2369,8 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		if (!kvm_apic_sw_enabled(apic))
 			val |= APIC_LVT_MASKED;
 		val &= apic_lvt_mask[index];
+		if (index == 0 && val != kvm_lapic_get_reg(apic, reg))
+			report_lint0_access(apic, val);
 		kvm_lapic_set_reg(apic, reg, val);
 		break;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d0d3dc3b7ef6..2b039b372c3f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10879,6 +10879,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_vcpu_flush_tlb_guest(vcpu);
 #endif
 
+		if (kvm_check_request(KVM_REQ_REPORT_LINT0_ACCESS, vcpu)) {
+			vcpu->run->exit_reason = KVM_EXIT_LINT0_ACCESS;
+			r = 0;
+			goto out;
+		}
 		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
 			vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
 			r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 637efc055145..ec97727f9de4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -178,6 +178,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
+#define KVM_EXIT_LINT0_ACCESS     40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -283,6 +284,10 @@ struct kvm_run {
 				__u64 flags;
 			};
 		} hypercall;
+		/* KVM_EXIT_LINT0_ACCESS */
+		struct {
+			__u32 value;
+		} lint0_access;
 		/* KVM_EXIT_TPR_ACCESS */
 		struct {
 			__u64 rip;


For LINT1, it should be less performance critical; if it's possible
to just go through all vCPUs, and do KVM_GET_LAPIC to check who you
should send a KVM_NMI to, then I'd do that.  I'd also accept a patch
that adds a VM-wide KVM_NMI ioctl that does the same in the hypervisor
if it's useful for you.

And since I've been proven wrong already, what do you need INIT/SIPI for?

Paolo