Re: [PATCH net-next 03/13] devlink: Support add and delete devlink port

Jason Gunthorpe <jgg@xxxxxxxxxx> · Wed, 18 Nov 2020 14:38:30 -0400

On Wed, Nov 18, 2020 at 11:03:24AM -0700, David Ahern wrote:

> With Connectx-4 Lx for example the netdev can have at most 63 queues

What netdev calls a queue is really a "can the device deliver
interrupts and packets to a given per-CPU queue" and covers a whole
spectrum of smaller limits like RSS scheme, # of available interrupts,
ability of the device to create queues, etc.

CX4Lx can create a huge number of queues, but hits one of these limits
that mean netdev's specific usage can't scale up. Other stuff like
RDMA doesn't have the same limits, and has tonnes of queues.

What seems to be needed is a resource controller concept like cgroup
has for processes. The system is really organized into a tree:

           physical device
              mlx5_core
        /      |      \      \                        (aux bus)
     netdev   rdma    vdpa   SF  etc
                             |                        (aux bus)
                           mlx5_core
                          /      \                    (aux bus)
                       netdev   vdpa

And it does make a lot of sense to start to talk about limits at each
tree level.

eg the top of the tree may have 128 physical interrupts. With 128 CPU
cores that isn't enough interrupts to support all of those things
concurrently.

So the user may want to configure:
 - The first level netdev only gets 64,
 - 3rd level mlx5_core gets 32 
 - Final level vdpa gets 8

Other stuff has to fight it out with the remaining shared interrupts.

In netdev land # of interrupts governs # of queues

For RDMA # of interrupts limits the CPU affinities for queues

VPDA limits the # of VMs that can use VT-d

The same story repeats for other less general resources, mlx5 also
has consumption of limited BAR space, and consumption of some limited
memory elements. These numbers are much bigger and may not need
explicit governing, but the general concept holds.

It would be very nice if the limit could be injected when the aux
device is created but before the driver is bound. I'm not sure how to
manage that though..

I assume other devices will be different, maybe some devices have a
limit on the number of total queues, or a limit on the number of
VDPA or RDMA devices.

Jason