> From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx> > Sent: Saturday, April 6, 2024 2:36 PM > > 在 2024/4/6 3:05, Parav Pandit 写道: > > Currently, PCI SFs and VFs use IO event queues to deliver netdev per > > channel events. The number of netdev channels is a function of IO > > event queues. In the second scenario of an RDMA device, the completion > > vectors are also a function of IO event queues. Currently, an > > administrator on the hypervisor has no means to provision the number > > of IO event queues for the SF device or the VF device. Device/firmware > > determines some arbitrary value for these IO event queues. Due to > > this, the SF netdev channels are unpredictable, and consequently, the > > performance is too. > > > > This short series introduces a new port function attribute: max_io_eqs. > > The goal is to provide administrators at the hypervisor level with the > > ability to provision the maximum number of IO event queues for a > > function. This gives the control to the administrator to provision > > right number of IO event queues and have predictable performance. > > > > Examples of when an administrator provisions (set) maximum number of > > IO event queues when using switchdev mode: > > > > $ devlink port show pci/0000:06:00.0/1 > > pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum > 0 vfnum 0 > > function: > > hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10 > > > > $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20 > > > > $ devlink port show pci/0000:06:00.0/1 > > pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum > 0 vfnum 0 > > function: > > hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20 > > > > This sets the corresponding maximum IO event queues of the function > > before it is enumerated. Thus, when the VF/SF driver reads the > > capability from the device, it sees the value provisioned by the > > hypervisor. The driver is then able to configure the number of > > channels for the net device, as well as the number of completion > > vectors for the RDMA device. The device/firmware also honors the > > provisioned value, hence any VF/SF driver attempting to create IO EQs > > beyond provisioned value results in an error. > > > > With above setting now, the administrator is able to achieve the 2x > > performance on SFs with 20 channels. In second example when SF was > > provisioned for a container with 2 cpus, the administrator provisioned > > only > > 2 IO event queues, thereby saving device resources. > > > > The following paragraph is the same with the above paragraph? > Ah, yes. I forgot to remove one of them while doing minor grammar changes. > > With the above settings now in place, the administrator achieved 2x > > performance with the SF device with 20 channels. In the second > > example, when the SF was provisioned for a container with 2 CPUs, the > > administrator provisioned only 2 IO event queues, thereby saving device > resources. > > > > changelog: > > v2->v3: > > - limited to 80 chars per line in devlink > > - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock > > on error path > > v1->v2: > > - limited comment to 80 chars per line in header file > > - fixed set function variables for reverse christmas tree > > - fixed comments from Kalesh > > - fixed missing kfree in get call > > - returning error code for get cmd failure > > - fixed error msg copy paste error in set on cmd failure > > > > Parav Pandit (2): > > devlink: Support setting max_io_eqs > > mlx5/core: Support max_io_eqs for a function > > > > .../networking/devlink/devlink-port.rst | 33 +++++++ > > .../mellanox/mlx5/core/esw/devlink_port.c | 4 + > > .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++ > > .../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++ > > include/net/devlink.h | 14 +++ > > include/uapi/linux/devlink.h | 1 + > > net/devlink/port.c | 53 ++++++++++ > > 7 files changed, 209 insertions(+) > >