Re: MMIO/PIO dispatch file descriptors (ioregionfd) design discussion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 03, 2020 at 06:34:00AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2020 at 11:10:36AM +0000, Stefan Hajnoczi wrote:
> > On Wed, Dec 02, 2020 at 01:06:28PM -0500, Peter Xu wrote:
> > > On Wed, Nov 25, 2020 at 12:44:07PM -0800, Elena Afanasova wrote:
> > > 
> > > [...]
> > > 
> > > > Wire protocol
> > > > -------------
> > > > The protocol spoken over the file descriptor is as follows. The device reads
> > > > commands from the file descriptor with the following layout::
> > > > 
> > > >   struct ioregionfd_cmd {
> > > >       __u32 info;
> > > >       __u32 padding;
> > > >       __u64 user_data;
> > > >       __u64 offset;
> > > >       __u64 data;
> > > >   };
> > > 
> > > I'm thinking whether it would be nice to have a handshake on the wire protocol
> > > before starting the cmd/resp sequence.
> > > 
> > > I was thinking about migration - we have had a hard time trying to be
> > > compatible between old/new qemus.  Now we fixed those by applying the same
> > > migration capabilities on both sides always so we do the handshake "manually"
> > > from libvirt, but it really should be done with a real handshake on the
> > > channel, imho..  That's another story, for sure.
> > > 
> > > My understanding is that the wire protocol is kind of a standalone (but tiny)
> > > protocol between kvm and the emulation process.  So I'm thinking the handshake
> > > could also help when e.g. kvm can fallback to an old version of wire protocol
> > > if it knows the emulation binary is old.  Ideally, I think this could even
> > > happen without VMM's awareness.
> > > 
> > > [...]
> > 
> > I imagined that would happen in the control plane (KVM ioctls) instead
> > of the data plane (the fd). There is a flags field in
> > ioctl(KVM_SET_IOREGION):
> > 
> >   struct kvm_ioregion {
> >       __u64 guest_paddr; /* guest physical address */
> >       __u64 memory_size; /* bytes */
> >       __u64 user_data;
> >       __s32 fd; /* previously created with KVM_CREATE_IOREGIONFD */
> >       __u32 flags;
> >       __u8  pad[32];
> >   };
> > 
> > When userspace sets up the ioregionfd it can tell the kernel which
> > features to enable.
> > 
> > Feature availability can be checked through ioctl(KVM_CHECK_EXTENSION).
> > 
> > Do you think this existing mechanism is enough? It's not clear to me
> > what kind of additional negotiation would be necessary between the
> > device emulation process and KVM after the ioregionfd has been
> > registered?
> > 
> > > > Ordering
> > > > --------
> > > > Guest accesses are delivered in order, including posted writes.
> > > 
> > > I'm wondering whether it should prepare for out-of-order commands assuming if
> > > there's no handshake so harder to extend, just in case there could be some slow
> > > commands so we still have chance to reply to a very trivial command during
> > > handling the slow one (then each command may require a command ID, too).  But
> > > it won't be a problem at all if we can easily extend the wire protocol so the
> > > ordering constraint can be extended too when really needed, and we can always
> > > start with in-order-only requests.
> > 
> > Elena and I brainstormed out-of-order commands but didn't include them
> > in the proposal because it's not clear that they are needed. For
> > multi-queue devices the per-queue registers can be assigned different
> > ioregionfds that are handled by dedicated threads.
> 
> The difficulty is I think the reverse: reading
> any register from a PCI device is normally enough to flush any
> writes and interrupts in progress.

Accesses dispatched on the same ioregionfd are ordered. If a device
requires ordering then it needs to use one ioregionfd.

I'm not aware of issues with ioeventfd where the vhost worker thread or
QEMU's IOThread process the doorbell while the vCPU thread processes
other virtio-pci register accesses. The situation is similar with
ioregionfd.

Stefan

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux