On 2/8/22 16:17, Woodhouse, David wrote: > On Wed, 2019-02-20 at 20:15 +0000, Joao Martins wrote: >> Enable virq offload to the hypervisor. The primary user for this is >> the timer virq. >> >> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> > > ... > >> @@ -636,8 +654,11 @@ static int kvm_xen_eventfd_assign(struct kvm *kvm, struct idr *port_to_evt, >> GFP_KERNEL); >> mutex_unlock(port_lock); >> >> - if (ret >= 0) >> + if (ret >= 0) { >> + if (evtchnfd->type == XEN_EVTCHN_TYPE_VIRQ) >> + kvm_xen_set_virq(kvm, evtchnfd); >> return 0; >> + } >> >> if (ret == -ENOSPC) >> ret = -EEXIST; >> > > So, I've spent a while vacillating about how best we should do this. > > Since event channels are bidirectional, we essentially have *two* > number spaces for them. > > We have the inbound events, in the KVM IRQ routing table (which 5.17 > already supports for delivering PIRQs, based on my mangling of your > earlier patches). > /me nods > And then we have the *outbound* events, which the guest can invoke with > the EVTCHNOP_send hypercall. Those are either: > • IPI, raising the same port# on the guest > • Interdomain looped back to a different port# on the guest > • Interdomain triggering an eventfd. > /me nods I am forgetting why you one do this on Xen: * Interdomain looped back to a different port# on the guest > In the last case, that eventfd can be set up with IRQFD for direct > event channel delivery to a different KVM/Xen guest. > > I've used your implemention, with an idr for the outbound port# space > intercepting EVTCHNOP_send for known ports and only letting userspace > see the hypercall if it's for a port# the kernel doesn't know. Looks a > bit like > https://git.infradead.org/users/dwmw2/linux.git/commitdiff/b4fbc49218a > > > But I *don't* want to do the VIRQ part shown above, "spotting" the VIRQ > in that outbound port# space and squirreling the information away into > the kvm_vcpu for when we need to deliver a timer event. > > The VIRQ isn't part of the *outbound* port# space; it isn't a port to > which a Xen guest can use EVTCHNOP_send to send an event. But it is still an event channel which port is unique regardless of port type/space hence (...) > If anything, > it would be part of the *inbound* port# space, in the KVM IRQ routing > table. So perhaps we could have a similar snippet in > kvm_xen_setup_evtchn() which spots a VIRQ and says "aha, now I know > where to deliver timer events for this vCPU". > (...) The thinking at the time was mainly simplicity so our way of saying 'offload the evtchn to KVM' was through the machinery that offloads the outbound part (using your terminology). I don't think even using XEN_EVENTFD as proposed here that that one could send an VIRQ via EVTCHNOP_send (I could be wrong as it has been a long time). Regardless, I think you have a good point to split the semantics and (...) > But... the IRQ routing table isn't really set up for that, and doesn't > have explicit *deletion*. The kvm_xen_setup_evtchn() function might get > called to translate into an updated table which is subsequently > *abandoned*, and it would never know. I suppose we could stash the GSI# > and then when we want to deliver it we look up that GSI# in the current > table and see if it's *stale* but that's getting nasty. > > I suppose that's not insurmountable, but the other problem with > inferring it from *either* the inbound or outbound port# tables is that > the vCPU might not even *exist* at the time the table is set up (after > live update or live migration, as the vCPU threads all go off and do > their thing and *eventually* create their vCPUs, while the machine > itself is being restored on the main VMM thread.) > > So I think I'm going to make the timer VIRQ (port#, priority) into an > explicit KVM_XEN_VCPU_ATTR_TYPE. (...) thus this makes sense. Do you particularly care about VIRQ_DEBUG? > Along with the *actual* timer expiry, > which we need to extract/restore for LU/LM too, don't we? > /me nods I haven't thought that one well for Live Update / Live Migration, but I wonder if wouldn't be better to be instead a general 'xen state' attr type should you need more than just pending timers expiry. Albeit considering that the VMM has everything it needs (?), perhaps for Xen PV timer look to be the oddball missing, and we donºt need to go that extent.