Re: [RFC PATCH v2 net-next] docs: net: add an explanation of VF (and other) Representors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18/08/2022 17:44, Parav Pandit wrote:
> A _whole_ network function is represented today using 
> a. netdevice represents representee's network port
> 
> b. devlink port function for function management

So, at the moment I'm just trying to document the current consensus,
 but where I plan to go _after_ this doc is building the case that
 devlink port as it exists today mixes in too much networking
 configuration that really belongs in the representor.  The example
 that motivated this for me is that setting the MAC address of the
 representee is currently a devlink port function operation, but
 this has nothing to do with the PCIe function and everything to do
 with the network port, so logically it should be an operation on
 the representor.  (I intend to develop a patch making it such, once
 we're all on the same page.)

I think a general rule is — would this operation make sense on a non-
 networking SR-IOV device?  If not, then it shouldn't be in devlink
 port.  E.g. why is port splitting a devlink port operation and not
 an operation on the port representor netdev?

> s/master PF/switchdev function

switchdev function might actually be the best name suggestion yet.
I like it.

> Please add text that,
> Packets transmitted by the representee and when they are not offloaded, such packets are delivered to the port representor netdevice.

That's exactly what
>> packets
>> +   transmitted to the representee which fail to match any switching rule
>> should
>> +   be received on the representor netdevice.
says.  (Although my choice of preposition — 'to', rather than 'by'
 — was less than clear.)

>> +What functions should have a representor?
>> +-----------------------------------------
>> +
>> +Essentially, for each virtual port on the device's internal switch,
>                                                                             ^^^^
> You probably wanted to say master PF internal switch here.
> 
> Better to word it as, each virtual port of a switchdev, there should be...

Hmm idk, I feel like "switchdev" has the connotation of "the software
 object inside the kernel representing the switch" rather than "the
 switch itself".

>> + - Other PFs on the local PCIe controller, and any VFs belonging to them.
> Local and/or external PCIe controllers.
That's literally the next bullet point.

>> + - PFs and VFs on other PCIe controllers on the device (e.g. for any
>> embedded
>> +   System-on-Chip within the SmartNIC).
Do I need to use the word "external" to make it more obvious?

>> + - PFs and VFs with other personalities, including network block devices
>> (such
>> +   as a vDPA virtio-blk PF backed by remote/distributed storage), if their
>> +   network access is implemented through a virtual switch port.
>> +   Note that such functions can require a representor despite the
>> representee
>> +   not having a netdev.
> This looks a big undertaking to represent them via "netdevice".
> Mostly they cannot be well represented by the netdevice.

The netdevice isn't supposed to represent the vDPA block device.  Rather
 it represents the switch port that the block device is using.

> In some cases, such vDPA devices are affiliated to the switchdev, but they use one or multiple of its ports.

If the block device uses multiple switch ports, then it should have
 multiple representors, one for each port, so that each switch port can
 be configured in the standard way.

Configuration of the block device itself is of course through separate
 interfaces which are common to non-switchdev virtual block devices.

>> + - Subfunctions (SFs) belonging to any of the above PFs or VFs, if they have
>> +   their own port on the switch (as opposed to using their parent PF's port).
> Not sure why the text has _if_ for SF and not for the VF.
> Do you see a SF device in the kernel that doesn't have their own port, due to which there is _if_ added?

This document is meant to cover situations that vendors are likely to
 find themselves in, not just those that have already been encountered.
It is plausible, at least to me, that a vendor might decide to implement
 subfunctions at a filtering rather than a switching level (i.e. it's
 just a bundle of queue pairs and you use something like ethtool NFC to
 direct traffic to it).  And if that happens, I don't want them to read
 my doc and (wrongly) think that they still need reprs for such SFs.
(The corresponding situation is far less likely to arise for VFs,
 because there's a clear understanding across the industry that VFs
 should look to their consumer like self-contained network devices,
 which implies transparent switching.)

>> +How are representors created?
>> +-----------------------------
>> +
>> +The driver instance attached to the master PF should enumerate the
>> +virtual ports on the switch, and for each representee, create a
>> +pure-software netdevice which has some form of in-kernel reference to
>> +the PF's own netdevice or driver private data (``netdev_priv()``).
> Today a user can create new virtual ports. Hence, these port represnetors and function representors are created dynamically without enumeration.
> Please add text describing both ways.

Again, this is addressed in the next sentence after you quoted:
>> +If switch ports can dynamically appear/disappear, the PF driver should
>> +create and destroy representors appropriately.

> For mlx5 case a representor netdevice has real queue from which tx/rx DMA happens from the device to/from network.
> It is not entirely pure software per say.
> Hence, "pure-software" is misleading. Please drop that word.

The rep dev doesn't own the BAR.  Everything it has it gets from
 the PF.  That's why it shouldn't SET_NETDEV_DEV, which is what I
 mean by "pure-software".

>> +The operations of the representor netdevice will generally involve
>> +acting through the master PF.  For example, ``ndo_start_xmit()`` might
>> +send the packet through a hardware TX queue attached to the master PF,
>> +with either packet metadata or queue configuration marking it for delivery
>> to the representee.
> Sharing/not sharing TX and RX queue among representor netdevices is not yet well established.

But in either case the hw TXQ will have been created out of the
 PF's BAR(s) (there's no other PCIe function/aperture to poke at
 the hardware from), that's what I mean by "attached to".  If you
 have a clearer way to word that I'm all ears.

>> +
>> +How are representors identified?
>> +--------------------------------
>> +
>> +The representor netdevice should *not* directly refer to a PCIe device (e.g.
>> +through ``net_dev->dev.parent`` / ``SET_NETDEV_DEV()``), either of the
>> +representee or of the master PF.
> This isn't true.
> Representor netdevices are connected to the switchdev device PCI function.

In some but not all existing drivers.
Note that I said "should not", not "does not".

> Without linking to PCI device, udev scriptology needs to grep among thousands of netdevices and its very inefficient.

It's a control plane operation, is efficiency really a prime
 concern?  If so, surely the right thing is to give
 /sys/class/net/$REP_DEV/ a suitably-named symlink to
 /sys/class/net/$SWITCH_DEV, performing the same role as the
 phys_switch_id matching without the global search, rather than
 a semantically-invalid PCIe device link.

>> +There are as yet no established conventions for naming representors
>> +which do not correspond to PCIe functions (e.g. accelerators and plugins).
> Netdevice represents the networking port of the function.

No, it represents any networking port on the switch.  Whether
 that has a PCIe function or not.
(The doc title, "Network Function Representors", is deliberately
 phrased to be interpretable as about "network functions" in the
 sense of NFV, rather than "networking PCIe functions".  An
 entire network function (in the NFV sense) could be implemented
 in hardware, in which case it would have a switch port and thus
 representor, but no PCIe function — it terminates traffic inside
 the device rather than sending it over the PCIe bus to a driver
 in the host or a VM.)

-ed



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux