So... what we have in the kernel already is entirely sufficient to run real Xen guests so they don't know they're running under KVM — with PV netback and blkback, Xen timers, etc. We can even run the PV Xen shim itself, to host PV guests. There are some pieces we really want to move into the kernel though, IPIs being top of the list — we really don't want them bouncing out to the userspace VMM. In-kernel timers would be good, too. I've finally got the basic 2l event channel delivery working in the kernel, with a minor detour into fixing nesting UAF bugs. The first simple use case for that is routing MSIs of assigned PCI devices to Xen PIRQs. So I'm ready to stare at the patch in $SUBJECT again... On Mon, 2020-11-30 at 15:08 +0000, Joao Martins wrote: > On 11/30/20 12:55 PM, David Woodhouse wrote: > > On Mon, 2020-11-30 at 12:17 +0000, Joao Martins wrote: > > > But still, eventfd is probably unnecessary complexity when another @type > > > (XEN_EVTCHN_TYPE_USER) would serve, and then just exiting to userspace > > > and let it route its evtchn port handling to the its own I/O handling thread. > > > > Hmm... so the benefit of the eventfd is that we can wake the I/O thread > > directly instead of bouncing out to userspace on the vCPU thread only > > for it to send a signal and return to the guest? Did you ever use that, > > and it is worth the additional in-kernel code? > > > This was my own preemptive optimization to the interface -- it's not worth > the added code for vIRQ and IPI at this point which is what *for sure* the > kernel will handle. > > > Is there any reason we'd want that for IPI or VIRQ event channels, or > > can it be only for INTERDOM/UNBOUND event channels which come later? > > > /me nods. > > No reason to keep that for IPI/vIRQ. > > > I'm tempted to leave it out of the first patch, and *maybe* add it back > > in a later patch, putting it in the union alongside .virq.type. > > > > > > struct kvm_xen_eventfd { > > > > #define XEN_EVTCHN_TYPE_VIRQ 0 > > #define XEN_EVTCHN_TYPE_IPI 1 > > __u32 type; > > __u32 port; > > __u32 vcpu; > > - __s32 fd; > > > > #define KVM_XEN_EVENTFD_DEASSIGN (1 << 0) > > #define KVM_XEN_EVENTFD_UPDATE (1 << 1) > > eentfd_ __u32 flags; > > union { > > struct { > > __u8 type; > > } virq; > > + struct { > > + __s32 eventfd; > > + } interdom; /* including unbound */ > > __u32 padding[2]; > > }; > > } evtchn; > > > > Does that make sense to you? > > > Yeap! :) It turns out there are actually two cases for interdom event channels. As well as signalling "other domains" (i.e. userspace) they can also be set up as loopback, so sending an event on one port actually raises an event on another of that guest's same ports, a bit like an IPI. So it isn't quite as simple as "if we handle interdom in the kernel it has an eventfd". It also means we need to provide both *source* and *target* ports. Which brings me to the KVM_XEN_EVENTFD_DEASSIGN/KVM_XEN_EVENTFD_UPDATE flags that you used. I'm not sure I like that API; given that we now need separate source and target ports can we handle 'deassign' just by setting the target to zero? (and eventfd to -1) ? I think your original patch leaked refcounts on the eventfd in kvm_xen_eventfd_update() because it would call eventfd_ctx_fdget() then neither used nor released the result? So I'm thinking of making it just a 'set'. And in fact I'm thinking of modelling it on the irq routing table by having a single call to set/update all of them at once? Any objections?
Attachment:
smime.p7s
Description: S/MIME cryptographic signature