On Wed, Nov 18, 2020 at 10:22:51PM -0800, Saeed Mahameed wrote: > > I think the biggest missing piece in my understanding is what's the > > technical difference between an SF and a VDPA device. > > Same difference as between a VF and netdev. > SF == VF, so a full HW function. > VDPA/RDMA/netdev/SCSI/nvme/etc.. are just interfaces (ULPs) sharing the > same functions as always been, nothing new about this. All the implementation details are very different, but this white paper from Intel goes into some detail the basic elements and rational for the SF concept: https://software.intel.com/content/dam/develop/public/us/en/documents/intel-scalable-io-virtualization-technical-specification.pdf What we are calling a sub-function here is a close cousin to what Intel calls an Assignable Device Interface. I expect to see other drivers following this general pattern eventually. A SF will eventually be assignable to a VM and the VM won't be able to tell the difference between a VF or SF providing the assignable PCI resources. VDPA is also assignable to a guest, but the key difference between mlx5's SF and VDPA is what guest driver binds to the virtual PCI function. For a SF the guest will bind mlx5_core, for VDPA the guest will bind virtio-net. So, the driver stack for a VM using VDPA might be Physical device [pci] -> mlx5_core -> [aux] -> SF -> [aux] -> mlx5_core -> [aux] -> mlx5_vdpa -> QEMU -> |VM| -> [pci] -> virtio_net When Parav is talking about creating VDPA devices he means attaching the VDPA accelerator subsystem to a mlx5_core, where ever that mlx5_core might be attached to. To your other remark: > > What are you NAK'ing? > Spawning multiple netdevs from one device by slicing up its queues. This is a bit vauge. In SRIOV a device spawns multiple netdevs for a physical port by "slicing up its physical queues" - where do you see the cross over between VMDq (bad) and SRIOV (ok)? I thought the issue with VMDq was more on the horrid management to configure the traffic splitting, not the actual splitting itself? In classic SRIOV the traffic is split by a simple non-configurable HW switch based on MAC address of the VF. mlx5 already has the extended version of that idea, we can run in switchdev mode and use switchdev to configure the HW switch. Now configurable switchdev rules split the traffic for VFs. This SF step replaces the VF in the above, but everything else is the same. The switchdev still splits the traffic, it still ends up in same nested netdev queue structure & RSS a VF/PF would use, etc, etc. No queues are "stolen" to create the nested netdev. >From the driver perspective there is no significant difference between sticking a netdev on a mlx5 VF or sticking a netdev on a mlx5 SF. A SF netdev is not going in and doing deep surgery to the PF netdev to steal queues or something. Both VF and SF will be eventually assignable to guests, both can support all the accelerator subsystems - VDPA, RDMA, etc. Both can support netdev. Compared to VMDq, I think it is really no comparison. SF/ADI is an evolution of a SRIOV VF from something PCI-SGI controlled to something device specific and lighter weight. SF/ADI come with a architectural security boundary suitable for assignment to an untrusted guest. It is not just a jumble of queues. VMDq is .. not that. Actually it has been one of the open debates in the virtualization userspace world. The approach to use switchdev to control the traffic splitting to VMs is elegant but many drivers are are not following this design. :( Finally, in the mlx5 model VDPA is just an "application". It asks the device to create a 'RDMA' raw ethernet packet QP that is uses rings formed in the virtio-net specification. We can create it in the kernel using mlx5_vdpa, and we can create it in userspace through the RDMA subsystem. Like any "RDMA" application it is contained by the security boundary of the PF/VF/SF the mlx5_core is running on. Jason