On Thu, Nov 14, 2024 at 10:45 AM Doug Covelli <doug.covelli@xxxxxxxxxxxx> wrote: > > On Wed, Nov 13, 2024 at 12:59 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > > On 11/13/24 17:24, Doug Covelli wrote: > > >> No worries, you're not hijacking :) The only reason is that it would > > >> be more code for a seldom used feature and anyway with worse performance. > > >> (To be clear, CR8 based accesses are allowed, but stores cause an exit > > >> in order to check the new TPR against IRR. That's because KVM's API > > >> does not have an equivalent of the TPR threshold as you point out below). > > > > > > I have not really looked at the code but it seems like it could also > > > simplify things as CR8 would be handled more uniformly regardless of > > > who is virtualizing the local APIC. > > > > Not much because CR8 basically does not exist at all (it's just a byte > > in memory) with userspace APIC. So it's not easy to make it simpler, even > > though it's less uniform. > > > > That said, there is an optimization: you only get KVM_EXIT_SET_TPR if > > CR8 decreases. > > > > >>> Also I could not find these documented anywhere but with MSFT's APIC our monitor > > >>> relies on extensions for trapping certain events such as INIT/SIPI plus LINT0 > > >>> and SVR writes: > > >>> > > >>> UINT64 X64ApicInitSipiExitTrap : 1; // WHvRunVpExitReasonX64ApicInitSipiTrap > > >>> UINT64 X64ApicWriteLint0ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap > > >>> UINT64 X64ApicWriteLint1ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap > > >>> UINT64 X64ApicWriteSvrExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap > > >> > > >> There's no need for this in KVM's in-kernel APIC model. INIT and > > >> SIPI are handled in the hypervisor and you can get the current > > >> state of APs via KVM_GET_MPSTATE. LINT0 and LINT1 are injected > > >> with KVM_INTERRUPT and KVM_NMI respectively, and they obey IF/PPR > > >> and NMI blocking respectively, plus the interrupt shadow; so > > >> there's no need for userspace to know when LINT0/LINT1 themselves > > >> change. The spurious interrupt vector register is also handled > > >> completely in kernel. > > > > > > I realize that KVM can handle LINT0/SVR updates themselves but our > > > interrupt subsystem relies on knowing the current values of these > > > registers even when not virtualizing the local APIC. I suppose we > > > could use KVM_GET_LAPIC to sync things up on demand but that seems > > > like it might nor be great from a performance point of view. > > > > Ah no, you're right---you want to track the CPU that has ExtINT enabled > > and send KVM_INTERRUPT to that one, I guess? And you need the spurious > > vector registers because writes can set the mask bit in LINTx, but > > essentially you want to trap LINT0 changes. > > > > Something like this (missing the KVM_ENABLE_CAP and KVM_CHECK_EXTENSION > > code) is good, feel free to include it in your v2 (Co-developed-by > > and Signed-off-by me): > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 5fb29ca3263b..b7dd89c99613 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -122,6 +122,7 @@ > > #define KVM_REQ_HV_TLB_FLUSH \ > > KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > > #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34) > > +#define KVM_REQ_REPORT_LINT0_ACCESS KVM_ARCH_REQ(35) > > > > #define CR0_RESERVED_BITS \ > > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ > > @@ -775,6 +776,7 @@ struct kvm_vcpu_arch { > > u64 smi_count; > > bool at_instruction_boundary; > > bool tpr_access_reporting; > > + bool lint0_access_reporting; > > bool xfd_no_write_intercept; > > u64 ia32_xss; > > u64 microcode_version; > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index 88dc43660d23..0e070f447aa2 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -1561,6 +1561,21 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic) > > apic->divide_count)); > > } > > > > +static void __report_lint0_access(struct kvm_lapic *apic, u32 value) > > +{ > > + struct kvm_vcpu *vcpu = apic->vcpu; > > + struct kvm_run *run = vcpu->run; > > + > > + kvm_make_request(KVM_REQ_REPORT_LINT0_ACCESS, vcpu); > > + run->lint0_access.value = value; > > +} > > + > > +static inline void report_lint0_access(struct kvm_lapic *apic, u32 value) > > +{ > > + if (apic->vcpu->arch.lint0_access_reporting) > > + __report_lint0_access(apic, value); > > +} > > + > > static void __report_tpr_access(struct kvm_lapic *apic, bool write) > > { > > struct kvm_vcpu *vcpu = apic->vcpu; > > @@ -2312,8 +2327,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) > > int i; > > > > for (i = 0; i < apic->nr_lvt_entries; i++) { > > - kvm_lapic_set_reg(apic, APIC_LVTx(i), > > - kvm_lapic_get_reg(apic, APIC_LVTx(i)) | APIC_LVT_MASKED); > > + u32 old = kvm_lapic_get_reg(apic, APIC_LVTx(i)); > > + kvm_lapic_set_reg(apic, APIC_LVTx(i), old | APIC_LVT_MASKED); > > + if (i == 0 && !(old & APIC_LVT_MASKED)) > > + report_lint0_access(apic, old | APIC_LVT_MASKED); > > } > > apic_update_lvtt(apic); > > atomic_set(&apic->lapic_timer.pending, 0); > > @@ -2352,6 +2369,8 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) > > if (!kvm_apic_sw_enabled(apic)) > > val |= APIC_LVT_MASKED; > > val &= apic_lvt_mask[index]; > > + if (index == 0 && val != kvm_lapic_get_reg(apic, reg)) > > + report_lint0_access(apic, val); > > kvm_lapic_set_reg(apic, reg, val); > > break; > > } > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index d0d3dc3b7ef6..2b039b372c3f 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -10879,6 +10879,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > > kvm_vcpu_flush_tlb_guest(vcpu); > > #endif > > > > + if (kvm_check_request(KVM_REQ_REPORT_LINT0_ACCESS, vcpu)) { > > + vcpu->run->exit_reason = KVM_EXIT_LINT0_ACCESS; > > + r = 0; > > + goto out; > > + } > > if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) { > > vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS; > > r = 0; > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index 637efc055145..ec97727f9de4 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -178,6 +178,7 @@ struct kvm_xen_exit { > > #define KVM_EXIT_NOTIFY 37 > > #define KVM_EXIT_LOONGARCH_IOCSR 38 > > #define KVM_EXIT_MEMORY_FAULT 39 > > +#define KVM_EXIT_LINT0_ACCESS 40 > > > > /* For KVM_EXIT_INTERNAL_ERROR */ > > /* Emulate instruction failed. */ > > @@ -283,6 +284,10 @@ struct kvm_run { > > __u64 flags; > > }; > > } hypercall; > > + /* KVM_EXIT_LINT0_ACCESS */ > > + struct { > > + __u32 value; > > + } lint0_access; > > /* KVM_EXIT_TPR_ACCESS */ > > struct { > > __u64 rip; > > > > > > For LINT1, it should be less performance critical; if it's possible > > to just go through all vCPUs, and do KVM_GET_LAPIC to check who you > > should send a KVM_NMI to, then I'd do that. I'd also accept a patch > > that adds a VM-wide KVM_NMI ioctl that does the same in the hypervisor > > if it's useful for you. > > Thanks for the patch - I'll get it a try but it might not be right away. > > > And since I've been proven wrong already, what do you need INIT/SIPI for? > > I don't think this one is as critical. I believe the reason it was > added was so that we can synchronize startup of the APs with execution > of the BSP for guests that do not do a good job of that (Windows). > > Doug We were able to get the in-kernel APIC working with our code using the split IRQ chip option with our virtual EFI FW even w/o the traps for SVR and LVT0 writes. Performance of Windows VMs is greatly improved as expected. Unfortunately our ancient legacy BIOS will not work with > 1 VCPU due to lack of support for IPIs with an archaic delivery mode of remote read which it uses to discover APs by attempting to read their APIC ID register. MSFT WHP supports this functionality via an option, WHvPartitionPropertyCodeApicRemoteReadSupport. Changing our legacy BIOS is not an option so in order to support Windows VMs with the legacy BIOS with decent performance we would either need to add support for remote reads of the APIC ID register to KVM or support CR8 accesses w/o exiting w/o the in-kernel APIC in order. Do you have a preference? Thanks, Doug -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.