Re: MMIO/PIO dispatch file descriptors (ioregionfd) design discussion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 03, 2020 at 11:10:36AM +0000, Stefan Hajnoczi wrote:
> On Wed, Dec 02, 2020 at 01:06:28PM -0500, Peter Xu wrote:
> > On Wed, Nov 25, 2020 at 12:44:07PM -0800, Elena Afanasova wrote:
> > 
> > [...]
> > 
> > > Wire protocol
> > > -------------
> > > The protocol spoken over the file descriptor is as follows. The device reads
> > > commands from the file descriptor with the following layout::
> > > 
> > >   struct ioregionfd_cmd {
> > >       __u32 info;
> > >       __u32 padding;
> > >       __u64 user_data;
> > >       __u64 offset;
> > >       __u64 data;
> > >   };
> > 
> > I'm thinking whether it would be nice to have a handshake on the wire protocol
> > before starting the cmd/resp sequence.
> > 
> > I was thinking about migration - we have had a hard time trying to be
> > compatible between old/new qemus.  Now we fixed those by applying the same
> > migration capabilities on both sides always so we do the handshake "manually"
> > from libvirt, but it really should be done with a real handshake on the
> > channel, imho..  That's another story, for sure.
> > 
> > My understanding is that the wire protocol is kind of a standalone (but tiny)
> > protocol between kvm and the emulation process.  So I'm thinking the handshake
> > could also help when e.g. kvm can fallback to an old version of wire protocol
> > if it knows the emulation binary is old.  Ideally, I think this could even
> > happen without VMM's awareness.
> > 
> > [...]
> 
> I imagined that would happen in the control plane (KVM ioctls) instead
> of the data plane (the fd). There is a flags field in
> ioctl(KVM_SET_IOREGION):
> 
>   struct kvm_ioregion {
>       __u64 guest_paddr; /* guest physical address */
>       __u64 memory_size; /* bytes */
>       __u64 user_data;
>       __s32 fd; /* previously created with KVM_CREATE_IOREGIONFD */
>       __u32 flags;
>       __u8  pad[32];
>   };
> 
> When userspace sets up the ioregionfd it can tell the kernel which
> features to enable.
> 
> Feature availability can be checked through ioctl(KVM_CHECK_EXTENSION).
> 
> Do you think this existing mechanism is enough? It's not clear to me
> what kind of additional negotiation would be necessary between the
> device emulation process and KVM after the ioregionfd has been
> registered?
> 
> > > Ordering
> > > --------
> > > Guest accesses are delivered in order, including posted writes.
> > 
> > I'm wondering whether it should prepare for out-of-order commands assuming if
> > there's no handshake so harder to extend, just in case there could be some slow
> > commands so we still have chance to reply to a very trivial command during
> > handling the slow one (then each command may require a command ID, too).  But
> > it won't be a problem at all if we can easily extend the wire protocol so the
> > ordering constraint can be extended too when really needed, and we can always
> > start with in-order-only requests.
> 
> Elena and I brainstormed out-of-order commands but didn't include them
> in the proposal because it's not clear that they are needed. For
> multi-queue devices the per-queue registers can be assigned different
> ioregionfds that are handled by dedicated threads.

The difficulty is I think the reverse: reading
any register from a PCI device is normally enough to flush any
writes and interrupts in progress.



> Out-of-order commands are only necessary if a device needs to
> concurrently process register accesses to the *same* set of registers. I
> think it's rare for hardware register interfaces to be designed like
> that.
> 
> This could be a mistake, of course. If someone knows a device that needs
> multiple in-flight register accesses, please let us know.
> 
> Stefan





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux