On 18/08/2022 17:44, Parav Pandit wrote: > A _whole_ network function is represented today using > a. netdevice represents representee's network port > > b. devlink port function for function management So, at the moment I'm just trying to document the current consensus, but where I plan to go _after_ this doc is building the case that devlink port as it exists today mixes in too much networking configuration that really belongs in the representor. The example that motivated this for me is that setting the MAC address of the representee is currently a devlink port function operation, but this has nothing to do with the PCIe function and everything to do with the network port, so logically it should be an operation on the representor. (I intend to develop a patch making it such, once we're all on the same page.) I think a general rule is — would this operation make sense on a non- networking SR-IOV device? If not, then it shouldn't be in devlink port. E.g. why is port splitting a devlink port operation and not an operation on the port representor netdev? > s/master PF/switchdev function switchdev function might actually be the best name suggestion yet. I like it. > Please add text that, > Packets transmitted by the representee and when they are not offloaded, such packets are delivered to the port representor netdevice. That's exactly what >> packets >> + transmitted to the representee which fail to match any switching rule >> should >> + be received on the representor netdevice. says. (Although my choice of preposition — 'to', rather than 'by' — was less than clear.) >> +What functions should have a representor? >> +----------------------------------------- >> + >> +Essentially, for each virtual port on the device's internal switch, > ^^^^ > You probably wanted to say master PF internal switch here. > > Better to word it as, each virtual port of a switchdev, there should be... Hmm idk, I feel like "switchdev" has the connotation of "the software object inside the kernel representing the switch" rather than "the switch itself". >> + - Other PFs on the local PCIe controller, and any VFs belonging to them. > Local and/or external PCIe controllers. That's literally the next bullet point. >> + - PFs and VFs on other PCIe controllers on the device (e.g. for any >> embedded >> + System-on-Chip within the SmartNIC). Do I need to use the word "external" to make it more obvious? >> + - PFs and VFs with other personalities, including network block devices >> (such >> + as a vDPA virtio-blk PF backed by remote/distributed storage), if their >> + network access is implemented through a virtual switch port. >> + Note that such functions can require a representor despite the >> representee >> + not having a netdev. > This looks a big undertaking to represent them via "netdevice". > Mostly they cannot be well represented by the netdevice. The netdevice isn't supposed to represent the vDPA block device. Rather it represents the switch port that the block device is using. > In some cases, such vDPA devices are affiliated to the switchdev, but they use one or multiple of its ports. If the block device uses multiple switch ports, then it should have multiple representors, one for each port, so that each switch port can be configured in the standard way. Configuration of the block device itself is of course through separate interfaces which are common to non-switchdev virtual block devices. >> + - Subfunctions (SFs) belonging to any of the above PFs or VFs, if they have >> + their own port on the switch (as opposed to using their parent PF's port). > Not sure why the text has _if_ for SF and not for the VF. > Do you see a SF device in the kernel that doesn't have their own port, due to which there is _if_ added? This document is meant to cover situations that vendors are likely to find themselves in, not just those that have already been encountered. It is plausible, at least to me, that a vendor might decide to implement subfunctions at a filtering rather than a switching level (i.e. it's just a bundle of queue pairs and you use something like ethtool NFC to direct traffic to it). And if that happens, I don't want them to read my doc and (wrongly) think that they still need reprs for such SFs. (The corresponding situation is far less likely to arise for VFs, because there's a clear understanding across the industry that VFs should look to their consumer like self-contained network devices, which implies transparent switching.) >> +How are representors created? >> +----------------------------- >> + >> +The driver instance attached to the master PF should enumerate the >> +virtual ports on the switch, and for each representee, create a >> +pure-software netdevice which has some form of in-kernel reference to >> +the PF's own netdevice or driver private data (``netdev_priv()``). > Today a user can create new virtual ports. Hence, these port represnetors and function representors are created dynamically without enumeration. > Please add text describing both ways. Again, this is addressed in the next sentence after you quoted: >> +If switch ports can dynamically appear/disappear, the PF driver should >> +create and destroy representors appropriately. > For mlx5 case a representor netdevice has real queue from which tx/rx DMA happens from the device to/from network. > It is not entirely pure software per say. > Hence, "pure-software" is misleading. Please drop that word. The rep dev doesn't own the BAR. Everything it has it gets from the PF. That's why it shouldn't SET_NETDEV_DEV, which is what I mean by "pure-software". >> +The operations of the representor netdevice will generally involve >> +acting through the master PF. For example, ``ndo_start_xmit()`` might >> +send the packet through a hardware TX queue attached to the master PF, >> +with either packet metadata or queue configuration marking it for delivery >> to the representee. > Sharing/not sharing TX and RX queue among representor netdevices is not yet well established. But in either case the hw TXQ will have been created out of the PF's BAR(s) (there's no other PCIe function/aperture to poke at the hardware from), that's what I mean by "attached to". If you have a clearer way to word that I'm all ears. >> + >> +How are representors identified? >> +-------------------------------- >> + >> +The representor netdevice should *not* directly refer to a PCIe device (e.g. >> +through ``net_dev->dev.parent`` / ``SET_NETDEV_DEV()``), either of the >> +representee or of the master PF. > This isn't true. > Representor netdevices are connected to the switchdev device PCI function. In some but not all existing drivers. Note that I said "should not", not "does not". > Without linking to PCI device, udev scriptology needs to grep among thousands of netdevices and its very inefficient. It's a control plane operation, is efficiency really a prime concern? If so, surely the right thing is to give /sys/class/net/$REP_DEV/ a suitably-named symlink to /sys/class/net/$SWITCH_DEV, performing the same role as the phys_switch_id matching without the global search, rather than a semantically-invalid PCIe device link. >> +There are as yet no established conventions for naming representors >> +which do not correspond to PCIe functions (e.g. accelerators and plugins). > Netdevice represents the networking port of the function. No, it represents any networking port on the switch. Whether that has a PCIe function or not. (The doc title, "Network Function Representors", is deliberately phrased to be interpretable as about "network functions" in the sense of NFV, rather than "networking PCIe functions". An entire network function (in the NFV sense) could be implemented in hardware, in which case it would have a switch port and thus representor, but no PCIe function — it terminates traffic inside the device rather than sending it over the PCIe bus to a driver in the host or a VM.) -ed