On Tue, 2025-03-04 at 15:46 +0100, Vitaly Kuznetsov wrote: > Sean Christopherson <seanjc@xxxxxxxxxx> writes: > > > On Tue, Mar 04, 2025, Vitaly Kuznetsov wrote: > > > Sean Christopherson <seanjc@xxxxxxxxxx> writes: > > > > > > > FYI, QEMU's Hyper-V emulation of HV_X64_MSR_EOM has been broken since QEMU commit > > > > c82d9d43ed ("KVM: Kick resamplefd for split kernel irqchip"), as nothing in KVM > > > > will forward the EOM notification to userspace. I have no idea if anything in > > > > QEMU besides hyperv_testdev.c cares. > > > > > > The only VMBus device in QEMU besides the testdev seems to be Hyper-V > > > ballooning driver, Cc: Maciej to check whether it's a real problem for > > > it or not. > > > > > > > The bug is reproducible by running the hyperv_connections KVM-Unit-Test with a > > > > split IRQCHIP. > > > > > > Thanks, I can reproduce the problem too. > > > > > > > Hacking QEMU and KVM (see KVM commit 654f1f13ea56 ("kvm: Check irqchip mode before > > > > assign irqfd") as below gets the test to pass. Assuming that's not a palatable > > > > solution, the other options I can think of would be for QEMU to intercept > > > > HV_X64_MSR_EOM when using a split IRQCHIP, or to modify KVM to do KVM_EXIT_HYPERV_SYNIC > > > > on writes to HV_X64_MSR_EOM with a split IRQCHIP. > > > > > > AFAIR, Hyper-V message interface is a fairly generic communication > > > mechanism which in theory can be used without interrupts at all: the > > > corresponding SINT can be masked and the guest can be polling for > > > messages, proccessing them and then writing to HV_X64_MSR_EOM to trigger > > > delivery on the next queued message. To support this scenario on the > > > backend, we need to receive HV_X64_MSR_EOM writes regardless of whether > > > irqchip is split or not. (In theory, we can get away without this by > > > just checking if pending messages can be delivered upon each vCPU entry > > > but this can take an undefined amount of time in some scenarios so I > > > guess we're better off with notifications). > > > > Before c82d9d43ed ("KVM: Kick resamplefd for split kernel irqchip"), and without > > a split IRCHIP, QEMU gets notified via eventfd. On writes to HV_X64_MSR_EOM, KVM > > invokes irq_acked(), i.e. irqfd_resampler_ack(), for all SINT routes. The eventfd > > signal gets back to sint_ack_handler(), which invokes msg_retry() to re-post the > > message. > > > > I.e. trapping HV_X64_MSR_EOM on would be a slow path relative to what's there for > > in-kernel IRQCHIP. > > My understanding is that the only type of message which requires fast > processing is STIMER messages but we don't do stimers in userspace. I > guess it is possible to have a competing 'noisy neighbough' in userspace > draining message slots but then we are slow anyway. > Hi, AFAIK, HV_X64_MSR_EOM is only one of the ways for the guest to signal that it processed the SYNIC message. Guest can also signal that it finished processing a SYNIC message using HV_X64_MSR_EOI or even by writing to EOI local apic register, and I actually think that the later is what is used by at least recent Windows. Now KVM does intercept EOI and it even "happens" to work with both APICv and AVIC: APICv has EOI 'exiting bitmap' and SYNC interrupts are set there (see vcpu_load_eoi_exitmap). AVIC intercepts EOI write iff the interrupt was level-triggered and SYNIC interrupts happen to be indeed level-triggered: static int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint) ... irq.shorthand = APIC_DEST_SELF; irq.dest_mode = APIC_DEST_PHYSICAL; irq.delivery_mode = APIC_DM_FIXED; irq.vector = vector; irq.level = 1; ... Best regards, Maxim Levitsky