On Tue, Dec 15, 2020 at 10:47:36AM -0800, Alexander Duyck wrote: > > Jason and Saeed explained this in great detail few weeks back in v0 version of the patchset at [1], [2] and [3]. > > I better not repeat all of it here again. Please go through it. > > If you may want to read precursor to it, RFC from Jiri at [4] is also explains this in great detail. > > I think I have a pretty good idea of how the feature works. My concern > is more the use of marketing speak versus actual functionality. The > way this is being setup it sounds like it is useful for virtualization > and it is not, at least in its current state. It may be at some point > in the future but I worry that it is really going to muddy the waters > as we end up with yet another way to partition devices. If we do a virtualization version then it will take a SF and instead of loading a mlx5_core on the SF aux device, we will load some vfio_mdev_mlx5 driver which will convert the SF aux device into a /dev/vfio/* This is essentially the same as how you'd take a PCI VF and replace mlx5_core with vfio-pci to get /dev/vfio/*. It has to be a special mdev driver because it sits on the SF aux device, not on the VF PCI device. The vfio_mdev_mlx5 driver will create what Intel calls an SIOV ADI from the SF, in other words the SF is already a superset of what a SIOV ADI should be. This matches very nicely the driver model in Linux, and I don't think it becomes more muddied as we go along. If anything it is becoming more clear and sane as things progress. > I agree with you on that. My thought was more the fact that the two > can be easily confused. If we are going to do this we need to define > that for networking devices perhaps that using the mdev interface > would be deprecated and we would need to go through devlink. However > before we do that we need to make sure we have this completely > standardized. mdev is for creating /dev/vfio/* interfaces in userspace. Using it for anything else is a bad abuse of the driver model. We had this debate endlessly already. AFAIK, there is nothing to deprecate, there are no mdev_drivers in drivers/net, and none should ever be added. The only mdev_driver that should ever exists is in vfio_mdev.c If someone is using a mdev_driver in drivers/net out of tree then they will need to convert to an aux driver for in-tree. > Yeah, I recall that. However I feel like it is being oversold. It > isn't "SR-IOV done right" it seems more like "VMDq done better". The > fact that interrupts are shared between the subfunctions is telling. The interrupt sharing is a consequence of having an ADI-like model without relying on IMS. When IMS works then shared interrupts won't be very necessary. Otherwise there is no choice but to share the MSI table of the function. > That is exactly how things work for Intel parts when they do VMDq as > well. The queues are split up into pools and a block of queues belongs > to a specific queue. From what I can can tell the only difference is > that there is isolation of the pool into specific pages in the BAR. > Which is essentially a requirement for mediated devices so that they > can be direct assigned. No, I said this to Jakub, mlx5 SFs have very little to do with queues. There is no some 'queue' HW element that needs partitioning. The SF is a hardware security boundary that wraps every operation a mlx5 device can do. This is why it is an ADI. It is not a crappy ADI that relies on hypervisor emulation, it is the real thing, just like a SRIOV VF. You stick it in the VM and the guest can directly talk to the HW. The HW provides the security. I can't put focus on this enough: A mlx5 SF can run a *full RDMA stack*. This means the driver can create all the RDMA HW objects and resources under the SF. This is *not* just steering some ethernet traffic to a few different ethernet queues like VMDq is. The Intel analog to a SF is a *full virtual function* on one of the Intel iWarp capable NICs, not VMDq. > Assuming at some point one of the flavours is a virtio-net style > interface you could eventually get to the point of something similar > to what seems to have been the goal of mdev which was meant to address > these two points. mlx5 already supports VDPA virtio-net on PF/VF and with this series SF too. ie you can take a SF, bind the vdpa_mlx5 driver, and get a fully HW accelerated "ADI" that does virtio-net. This can be assigned to a guest and shows up as a PCI virtio-net netdev. With VT-d guest packet tx/rx on this netdev never uses the hypervisor CPU. > The point is that we should probably define some sort of standard > and/or expectations on what should happen when you spawn a new > interface. Would it be acceptable for the PF and existing subfunctions > to have to reset if you need to rebalance the IRQ distribution, or > should they not be disrupted when you spawn a new interface? It is best to think of the SF as an ADI, so if you change something in the PF and that causes the driver attached to the ADI in a VM to reset, is that OK? I'd say no. Jason