Re: QEMU's Hyper-V HV_X64_MSR_EOM is broken with split IRQCHIP

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Tue, 04 Mar 2025 16:31:31 -0500

On Tue, 2025-03-04 at 15:46 +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
> 
> > On Tue, Mar 04, 2025, Vitaly Kuznetsov wrote:
> > > Sean Christopherson <seanjc@xxxxxxxxxx> writes:
> > > 
> > > > FYI, QEMU's Hyper-V emulation of HV_X64_MSR_EOM has been broken since QEMU commit
> > > > c82d9d43ed ("KVM: Kick resamplefd for split kernel irqchip"), as nothing in KVM
> > > > will forward the EOM notification to userspace.  I have no idea if anything in
> > > > QEMU besides hyperv_testdev.c cares.
> > > 
> > > The only VMBus device in QEMU besides the testdev seems to be Hyper-V
> > > ballooning driver, Cc: Maciej to check whether it's a real problem for
> > > it or not.
> > > 
> > > > The bug is reproducible by running the hyperv_connections KVM-Unit-Test with a
> > > > split IRQCHIP.
> > > 
> > > Thanks, I can reproduce the problem too.
> > > 
> > > > Hacking QEMU and KVM (see KVM commit 654f1f13ea56 ("kvm: Check irqchip mode before
> > > > assign irqfd") as below gets the test to pass.  Assuming that's not a palatable
> > > > solution, the other options I can think of would be for QEMU to intercept
> > > > HV_X64_MSR_EOM when using a split IRQCHIP, or to modify KVM to do KVM_EXIT_HYPERV_SYNIC
> > > > on writes to HV_X64_MSR_EOM with a split IRQCHIP.
> > > 
> > > AFAIR, Hyper-V message interface is a fairly generic communication
> > > mechanism which in theory can be used without interrupts at all: the
> > > corresponding SINT can be masked and the guest can be polling for
> > > messages, proccessing them and then writing to HV_X64_MSR_EOM to trigger
> > > delivery on the next queued message. To support this scenario on the
> > > backend, we need to receive HV_X64_MSR_EOM writes regardless of whether
> > > irqchip is split or not. (In theory, we can get away without this by
> > > just checking if pending messages can be delivered upon each vCPU entry
> > > but this can take an undefined amount of time in some scenarios so I
> > > guess we're better off with notifications).
> > 
> > Before c82d9d43ed ("KVM: Kick resamplefd for split kernel irqchip"), and without
> > a split IRCHIP, QEMU gets notified via eventfd.  On writes to HV_X64_MSR_EOM, KVM
> > invokes irq_acked(), i.e. irqfd_resampler_ack(), for all SINT routes.  The eventfd
> > signal gets back to sint_ack_handler(), which invokes msg_retry() to re-post the
> > message.
> > 
> > I.e. trapping HV_X64_MSR_EOM on would be a slow path relative to what's there for
> > in-kernel IRQCHIP.
> 
> My understanding is that the only type of message which requires fast
> processing is STIMER messages but we don't do stimers in userspace. I
> guess it is possible to have a competing 'noisy neighbough' in userspace
> draining message slots but then we are slow anyway.
> 

Hi,

AFAIK, HV_X64_MSR_EOM is only one of the ways for the guest to signal that it processed the SYNIC message.

Guest can also signal that it finished processing a SYNIC message using HV_X64_MSR_EOI or even by writing to EOI
local apic register, and I actually think that the later is what is used by at least recent Windows.

Now KVM does intercept EOI and it even "happens" to work with both APICv and AVIC:

APICv has EOI 'exiting bitmap' and SYNC interrupts are set there (see vcpu_load_eoi_exitmap).

AVIC intercepts EOI write iff the interrupt was level-triggered and SYNIC interrupts happen
to be indeed level-triggered:

static int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint)
...
	irq.shorthand = APIC_DEST_SELF;
	irq.dest_mode = APIC_DEST_PHYSICAL;
	irq.delivery_mode = APIC_DM_FIXED;
	irq.vector =
vector;
	irq.level = 1;
...

Best regards,
	Maxim Levitsky