On Thu, 2020-08-20 at 18:56 -0400, Laine Stump wrote: > On 8/18/20 2:37 PM, Jonathon Jongsma wrote: > > vDPA network devices allow high-performance networking in a virtual > > machine by providing a wire-speed data path. These devices require > > a > > vendor-specific host driver but the data path follows the virtio > > specification. > > > > The support for vDPA devices was recently added to qemu. This > > allows > > libvirt to support these devices. It requires that the device is > > configured on the host with the appropriate vendor-specific driver. > > This will create a chardev on the host at e.g. /dev/vhost-vdpa-0. > > That > > chardev path can then be used to define a new interface with > > type='vdpa'. > > --- [snip] > > > > + > > +:: > > + > > + ... > > + <devices> > > + <interface type='vdpa'> > > + <source dev='/dev/vhost-vdpa-0'/> > > (The above device is created (I just learned this from you in IRC!) > by > unbinding a VF from its NIC driver on the host, and re-binding it to > a > special VDPA-VF driver.) > > > As we were just discussing online, on one hand it could be nice if > libvirt could automatically handle rebinding the VF to the vdpa host > driver (given the PCI address of the VF), to make it easier to use > (because no advance setup would be needed), similar to what's > already > done with hostdev devices (and <interface type='hostdev'>) when > managed='yes' (which is the default setting). > > > On the other hand, it is exactly that managed='yes' functionality > that > has created more "libvirt-but-not-really-libvirt" bug reports than > any > other aspect of vfio device assignment, because the process of > unbinding > and rebinding drivers is timing-sensitive and causes code that's > usually > run only once at host boot-time to be run hundreds of times thus > making > it more likely to expose infrequently-hit bugs. > > > I just bring this up in advance of someone suggesting the addition > of > managed='yes' here to put in my vote for *not* doing it, and instead > using that same effort to provide some sort of API in the node- > device > driver for easily creating one or more VDPA devices from VFs, which > could be done once at host boot time, and thus avoid the level of > "libvirt-not-libvirt" bug reports for VDPA. (and after that maybe > even > an API to allocate a device from that pool to be used by a guest). > But > that's for later. I'd really love to hear some additional opinions on this topic from some more of the libvirt "old-timers". I intentionally kept the initial vdpa support simple by requiring that the device is already setup and bound to the appropriate driver. But I want to make sure that we can add additional capabilities later (as Laine suggested) with this same API/schema. Anybody else have thoughts on this topic? > > > > + </interface> > > + </devices> > > + ... > > + > > :anchor:`<a id="elementsTeaming"/>` > > > > Teaming a virtio/hostdev NIC pair > > diff --git a/docs/schemas/domaincommon.rng > > b/docs/schemas/domaincommon.rng > > index 0d0dcbc5ce..17f74490f4 100644 > > --- a/docs/schemas/domaincommon.rng > > +++ b/docs/schemas/domaincommon.rng > > @@ -3108,6 +3108,21 @@ > > <ref name="interface-options"/> > > </interleave> > > </group> > > + > > + <group> > > + <attribute name="type"> > > + <value>vdpa</value> > > + </attribute> > > + <interleave> > > + <element name="source"> > > + <attribute name="dev"> > > + <ref name="deviceName"/> > > + </attribute> > > + </element> > > + <ref name="interface-options"/> > > + </interleave> > > + </group> > > + > > </choice> > > <optional> > > <attribute name="trustGuestRxFilters"> > > [snip] > > diff --git a/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args > > b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args > > new file mode 100644 > > index 0000000000..8e76ac7794 > > --- /dev/null > > +++ b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args > > @@ -0,0 +1,37 @@ > > +LC_ALL=C \ > > +PATH=/bin \ > > +HOME=/tmp/lib/domain--1-QEMUGuest1 \ > > +USER=test \ > > +LOGNAME=test \ > > +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ > > +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ > > +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ > > +QEMU_AUDIO_DRV=none \ > > +/usr/bin/qemu-system-i386 \ > > +-name guest=QEMUGuest1,debug-threads=on \ > > +-S \ > > +-object secret,id=masterKey0,format=raw,\ > > +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \ > > +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ > > +-cpu qemu64 \ > > +-m 214 \ > > +-overcommit mem-lock=off \ > > +-smp 1,sockets=1,cores=1,threads=1 \ > > +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ > > +-display none \ > > +-no-user-config \ > > +-nodefaults \ > > +-chardev socket,id=charmonitor,fd=1729,server,nowait \ > > +-mon chardev=charmonitor,id=monitor,mode=control \ > > +-rtc base=utc \ > > +-no-shutdown \ > > +-no-acpi \ > > +-boot strict=on \ > > +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ > > +-add-fd set=0,fd=1732 \ > > +-netdev vhost-vdpa,vhostdev=/dev/fdset/0,id=hostnet0 \ > > Okay, I'm feeling too lazy to parse through the code above an see > how > you arrived at "vhostdev='/dev/fdset/0'", but that doesn't look > right. > Shouldn't you be ending up with "-netdev vhost-vdpa,fd=NN,..."? The > document I have shows that syntax is supported, so there shouldn't > be > any need to do the add-fd stuff in this case. The initial proposal for vdpa in qemu supported both vhostdev= and fd= parameters, but the final implementation does not actually support an fd= parameter here. The recommended way to pass an pre-opened fd to qemu in this scenario is to use the 'add-fd' and /dev/fdset/ method as shown above. As far as I understand from reading qemu code, /dev/fdset/N is not actually a file that exists in the filesystem. Instead, when you call 'add-fd', qemu adds that fd to an internal mapping that maps from the fdset specified by set=N to the passed fd. Whenever qemu tries to open a file, qemu_open() has special handling for filenames that start with "/dev/fdset/": it looks up the fd associated with that fdset id and returns that instead of attempting to open the file path. So I think the above should be correct. > > > I think the next step should be to find some hardware and give this > a > smoke test! :-) Indeed, I'm working with Cindy Lu (who implemented the qemu vdpa support) to try to get this tested properly. Unfortunately I've been unable to get vdpa_sim working and I don't have access to hardware at the moment. Thanks for the review! Jonathon