I got some feedback from John Ferlan on a different forum about missing handling of migration and qemu capabilities. I'm adding this to my patch, but I'd appreciate any additional feedback, particularly on the xml format and the managed=yes/node-device questions below. On Mon, 2020-08-24 at 16:33 -0500, Jonathon Jongsma wrote: > On Thu, 2020-08-20 at 18:56 -0400, Laine Stump wrote: > > On 8/18/20 2:37 PM, Jonathon Jongsma wrote: > > > vDPA network devices allow high-performance networking in a > > > virtual > > > machine by providing a wire-speed data path. These devices > > > require > > > a > > > vendor-specific host driver but the data path follows the virtio > > > specification. > > > > > > The support for vDPA devices was recently added to qemu. This > > > allows > > > libvirt to support these devices. It requires that the device is > > > configured on the host with the appropriate vendor-specific > > > driver. > > > This will create a chardev on the host at e.g. /dev/vhost-vdpa-0. > > > That > > > chardev path can then be used to define a new interface with > > > type='vdpa'. > > > --- > > [snip] > > > > > > > + > > > +:: > > > + > > > + ... > > > + <devices> > > > + <interface type='vdpa'> > > > + <source dev='/dev/vhost-vdpa-0'/> > > > > (The above device is created (I just learned this from you in IRC!) > > by > > unbinding a VF from its NIC driver on the host, and re-binding it > > to > > a > > special VDPA-VF driver.) > > > > > > As we were just discussing online, on one hand it could be nice if > > libvirt could automatically handle rebinding the VF to the vdpa > > host > > driver (given the PCI address of the VF), to make it easier to use > > (because no advance setup would be needed), similar to what's > > already > > done with hostdev devices (and <interface type='hostdev'>) when > > managed='yes' (which is the default setting). > > > > > > On the other hand, it is exactly that managed='yes' functionality > > that > > has created more "libvirt-but-not-really-libvirt" bug reports than > > any > > other aspect of vfio device assignment, because the process of > > unbinding > > and rebinding drivers is timing-sensitive and causes code that's > > usually > > run only once at host boot-time to be run hundreds of times thus > > making > > it more likely to expose infrequently-hit bugs. > > > > > > I just bring this up in advance of someone suggesting the addition > > of > > managed='yes' here to put in my vote for *not* doing it, and > > instead > > using that same effort to provide some sort of API in the node- > > device > > driver for easily creating one or more VDPA devices from VFs, > > which > > could be done once at host boot time, and thus avoid the level of > > "libvirt-not-libvirt" bug reports for VDPA. (and after that maybe > > even > > an API to allocate a device from that pool to be used by a guest). > > But > > that's for later. > > I'd really love to hear some additional opinions on this topic from > some more of the libvirt "old-timers". I intentionally kept the > initial > vdpa support simple by requiring that the device is already setup and > bound to the appropriate driver. But I want to make sure that we can > add additional capabilities later (as Laine suggested) with this same > API/schema. > > Anybody else have thoughts on this topic? > > > > > > > + </interface> > > > + </devices> > > > + ... > > > + > > > :anchor:`<a id="elementsTeaming"/>` > > > > > > Teaming a virtio/hostdev NIC pair > > > diff --git a/docs/schemas/domaincommon.rng > > > b/docs/schemas/domaincommon.rng > > > index 0d0dcbc5ce..17f74490f4 100644 > > > --- a/docs/schemas/domaincommon.rng > > > +++ b/docs/schemas/domaincommon.rng > > > @@ -3108,6 +3108,21 @@ > > > <ref name="interface-options"/> > > > </interleave> > > > </group> > > > + > > > + <group> > > > + <attribute name="type"> > > > + <value>vdpa</value> > > > + </attribute> > > > + <interleave> > > > + <element name="source"> > > > + <attribute name="dev"> > > > + <ref name="deviceName"/> > > > + </attribute> > > > + </element> > > > + <ref name="interface-options"/> > > > + </interleave> > > > + </group> > > > + > > > </choice> > > > <optional> > > > <attribute name="trustGuestRxFilters"> > > > > > [snip] > > > > diff --git a/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args > > > b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args > > > new file mode 100644 > > > index 0000000000..8e76ac7794 > > > --- /dev/null > > > +++ b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args > > > @@ -0,0 +1,37 @@ > > > +LC_ALL=C \ > > > +PATH=/bin \ > > > +HOME=/tmp/lib/domain--1-QEMUGuest1 \ > > > +USER=test \ > > > +LOGNAME=test \ > > > +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ > > > +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ > > > +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ > > > +QEMU_AUDIO_DRV=none \ > > > +/usr/bin/qemu-system-i386 \ > > > +-name guest=QEMUGuest1,debug-threads=on \ > > > +-S \ > > > +-object secret,id=masterKey0,format=raw,\ > > > +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \ > > > +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ > > > +-cpu qemu64 \ > > > +-m 214 \ > > > +-overcommit mem-lock=off \ > > > +-smp 1,sockets=1,cores=1,threads=1 \ > > > +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ > > > +-display none \ > > > +-no-user-config \ > > > +-nodefaults \ > > > +-chardev socket,id=charmonitor,fd=1729,server,nowait \ > > > +-mon chardev=charmonitor,id=monitor,mode=control \ > > > +-rtc base=utc \ > > > +-no-shutdown \ > > > +-no-acpi \ > > > +-boot strict=on \ > > > +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ > > > +-add-fd set=0,fd=1732 \ > > > +-netdev vhost-vdpa,vhostdev=/dev/fdset/0,id=hostnet0 \ > > > > Okay, I'm feeling too lazy to parse through the code above an see > > how > > you arrived at "vhostdev='/dev/fdset/0'", but that doesn't look > > right. > > Shouldn't you be ending up with "-netdev vhost-vdpa,fd=NN,..."? > > The > > document I have shows that syntax is supported, so there shouldn't > > be > > any need to do the add-fd stuff in this case. > > The initial proposal for vdpa in qemu supported both vhostdev= and > fd= > parameters, but the final implementation does not actually support an > fd= parameter here. The recommended way to pass an pre-opened fd to > qemu in this scenario is to use the 'add-fd' and /dev/fdset/ method > as > shown above. > > As far as I understand from reading qemu code, /dev/fdset/N is not > actually a file that exists in the filesystem. Instead, when you call > 'add-fd', qemu adds that fd to an internal mapping that maps from the > fdset specified by set=N to the passed fd. Whenever qemu tries to > open > a file, qemu_open() has special handling for filenames that start > with > "/dev/fdset/": it looks up the fd associated with that fdset id and > returns that instead of attempting to open the file path. > > So I think the above should be correct. > > > > > I think the next step should be to find some hardware and give this > > a > > smoke test! :-) > > Indeed, I'm working with Cindy Lu (who implemented the qemu vdpa > support) to try to get this tested properly. Unfortunately I've been > unable to get vdpa_sim working and I don't have access to hardware at > the moment. > > Thanks for the review! > > Jonathon >