On Thu, Dec 03, 2020 at 11:10:36AM +0000, Stefan Hajnoczi wrote: > On Wed, Dec 02, 2020 at 01:06:28PM -0500, Peter Xu wrote: > > On Wed, Nov 25, 2020 at 12:44:07PM -0800, Elena Afanasova wrote: > > > > [...] > > > > > Wire protocol > > > ------------- > > > The protocol spoken over the file descriptor is as follows. The device reads > > > commands from the file descriptor with the following layout:: > > > > > > struct ioregionfd_cmd { > > > __u32 info; > > > __u32 padding; > > > __u64 user_data; > > > __u64 offset; > > > __u64 data; > > > }; > > > > I'm thinking whether it would be nice to have a handshake on the wire protocol > > before starting the cmd/resp sequence. > > > > I was thinking about migration - we have had a hard time trying to be > > compatible between old/new qemus. Now we fixed those by applying the same > > migration capabilities on both sides always so we do the handshake "manually" > > from libvirt, but it really should be done with a real handshake on the > > channel, imho.. That's another story, for sure. > > > > My understanding is that the wire protocol is kind of a standalone (but tiny) > > protocol between kvm and the emulation process. So I'm thinking the handshake > > could also help when e.g. kvm can fallback to an old version of wire protocol > > if it knows the emulation binary is old. Ideally, I think this could even > > happen without VMM's awareness. > > > > [...] > > I imagined that would happen in the control plane (KVM ioctls) instead > of the data plane (the fd). There is a flags field in > ioctl(KVM_SET_IOREGION): > > struct kvm_ioregion { > __u64 guest_paddr; /* guest physical address */ > __u64 memory_size; /* bytes */ > __u64 user_data; > __s32 fd; /* previously created with KVM_CREATE_IOREGIONFD */ > __u32 flags; > __u8 pad[32]; > }; > > When userspace sets up the ioregionfd it can tell the kernel which > features to enable. > > Feature availability can be checked through ioctl(KVM_CHECK_EXTENSION). > > Do you think this existing mechanism is enough? It's not clear to me > what kind of additional negotiation would be necessary between the > device emulation process and KVM after the ioregionfd has been > registered? > > > > Ordering > > > -------- > > > Guest accesses are delivered in order, including posted writes. > > > > I'm wondering whether it should prepare for out-of-order commands assuming if > > there's no handshake so harder to extend, just in case there could be some slow > > commands so we still have chance to reply to a very trivial command during > > handling the slow one (then each command may require a command ID, too). But > > it won't be a problem at all if we can easily extend the wire protocol so the > > ordering constraint can be extended too when really needed, and we can always > > start with in-order-only requests. > > Elena and I brainstormed out-of-order commands but didn't include them > in the proposal because it's not clear that they are needed. For > multi-queue devices the per-queue registers can be assigned different > ioregionfds that are handled by dedicated threads. The difficulty is I think the reverse: reading any register from a PCI device is normally enough to flush any writes and interrupts in progress. > Out-of-order commands are only necessary if a device needs to > concurrently process register accesses to the *same* set of registers. I > think it's rare for hardware register interfaces to be designed like > that. > > This could be a mistake, of course. If someone knows a device that needs > multiple in-flight register accesses, please let us know. > > Stefan