On Wed, 16 Dec 2020 20:44:21 -0800 Saeed Mahameed wrote: > On Wed, 2020-12-16 at 15:59 -0800, Jakub Kicinski wrote: > > On Wed, 16 Dec 2020 03:42:51 +0000 Parav Pandit wrote: > > > > From: Jakub Kicinski <kuba@xxxxxxxxxx> > > > > So subfunctions don't have a VF id but they may have a > > > > controller? > > > > > > > Right. SF can be on external controller. > > > > > > > Can you tell us more about the use cases and deployment models > > > > you're > > > > intending to support? Let's not add attributes and info which > > > > will go unused. > > > > > > > External will be used the same way how it is used for PF and VF. > > > > > > > How are SFs supposed to be used with SmartNICs? Are you assuming > > > > single > > > > domain of control? > > > No. it is not assumed. SF can be deployed from smartnic to external > > > host. > > > A user has to pass appropriate controller number, pf number > > > attributes during creation time. > > > > My problem with this series is that I've gotten some real life > > application exposure over the last year, and still I have no idea > > who is going to find this feature useful and why. > > > > That's the point of my questions in the previous email - what > > are the use cases, how are they going to operate. > > > > The main focus of this feature is scale-ability we want to run > thousands of Containers/VMs, this is useful for both smartnic and > baremetal hypervisor worlds, where security and control is exclusive to > the eswitch manager may it be the smarnic embedded CPU or the x86 > Hypervisor. > > deployment models is identical to SRIOV, the only difference is the > instantiation model of SF, which is the main discussion point of this > series (i hope), which to my taste is very modest and minimal. > after SF is instantiated from that point nothing is new, the SF is > exposing standard linux interfaces netdev/rdma identical to what VF > does, most likely you will assign them a namespace and pass them > through to a container or assign them (not direct assignment) to a VM > via the virt stack, or create a vdpa instance and pass it to a virtio > interface. > > There are endless usecases for the netdev stack, for customers who want "endless" :) > high scale virtualized/containerized environments, with thousands of > network functions that can deliver high speed and full offload > accelerators, Native XDP, Crypto, encap/decap, and HW filtering and > processing pipeline capabilities. > > I have a long list of customers with various and different applications > and i am not even talking about the rdma and vdpa customers ! those > customers just can't wait to leave sriov behind and scale up ! > > this feature has a lot of value to the netdev users only because of the > minimal foot print to the netdev stack (to be honest there is no change > in netdev, only a thin API layer in devlink) and the immediate and > effortless benefits to deploy multiple (accelerated) netdevs at scale. The acceleration can hopefully be plumbed through the software devices. I think your HW is capable of doing large queue sets so I'm curious how this actually performs. We're probably talking 1000+ queues here - the CPU will have hard time serving so many queues. In my experiments basically the more queues the more cache trashing, the more interrupts, etc. and the lower the performance. > > It's hard to review an API without knowing the use of it. iproute2 > > is low level plumbing. > > I don't know how to put this, let me try: > A) SRIOV model > echo 128 > /sys/class/net/eth0/device/sriov_numvfs > ubind vf > > ip set vf attribute x > configure representor .. > deploy vf/netdev/rdma interface into the container No, no, my point is that for SR-IOV it's OpenStack, libvirt etc. which do this. I understand the manual steps. Often problems pop up when real systems try to string the HW objects together, allocated them, learn their capabilities, etc. > B) SF model > you do (every thing under the devlink umbrella/switchdev): > for i in {1..1024} ; do > devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum $i > devlink port sf $i set attribute x > > # from here on, it is identical to a VF > configure representor > deply sf/netdev/rdma interfaces into a container > > B is more scale-able and has more visibility and controllability to > the user, after you create the SFs deployment and usecases are > identical to SRIOV VF usecases. > > See the improvement ? :) > > > Here the patch is adding the ability to apparently create a SF on > > a remote controller. If you haven't thought that use case through > > just don't allow it until you know how it will work. > > We have thought the use case through it is not any different from the > local controller use case. the code is uniform, we need to work hard to > block a remote controller :) .. So the SF is always created from the eswitch controller side? How does the host side look? I really think that for ease of merging this we should leave the remote controller out at the beginning - only allow local creation. > > > > It seems that the way the industry is moving the major > > > > use case for SmartNICs is bare metal. > > > > > > > > I always assumed nested eswitches when thinking about SmartNICs, > > > > what > > > > are you intending to do? > > > > > > > Mlx5 doesn't support nested eswitch. SF can be deployed on the > > > external controller PCI function. > > > But this interface neither limited nor enforcing nested or flat > > > eswitch. > > > > > > > What are your plans for enabling this feature in user space > > > > project? > > > Do you mean K8s plugin or iproute2? Can you please tell us what > > > user space project? > > > > That's my question. For SR-IOV it'd be all the virt stacks out there. > > But this can't do virt. So what can it do? > > you are thinking VF direct assignment. but don't forget > virt handles netdev assignment to a vm perfectly fine and SF has a > netdev. > > And don't get me started on the weird virt handling of SRIOV VF, the > whole thing is a big mess :) it shouldn't be a de facto standard that > we need to follow..