Hi Laine, Sorry for the mistake in the split of the commit, and thank for fix it 😊. I review you change and it is better. I tested it on Mellanox card with kernel 5.9 and it work good with SR-IOV legacy and SR-IOV switchdev. You have my LGTM for it. > -----Original Message----- > From: Laine Stump <laine@xxxxxxxxxx> > Sent: Monday, January 25, 2021 7:23 AM > To: libvir-list@xxxxxxxxxx > Cc: Moshe Levi <moshele@xxxxxxxxxx>; Adrian Chiris <adrianc@xxxxxxxxxx> > Subject: Re: [PATCH v2 2/2] util: Add phys_port_name support on > virPCIGetNetName > > External email: Use caution opening links or attachments > > > On 1/21/21 7:15 AM, Moshe Levi wrote: > > From: Dmytro Linkin <dlinkin@xxxxxxxxxx> > > > > Current virPCIGetNetName() logic is to get net device name by checking > > it's phys_port_id, if caller provide it, or by it's index (eg, by it's > > position at sysfs net directory). This approach worked fine up until > > linux kernel version 5.8, where NVIDIA Mellanox driver implemented > > linking of VFs' representors to PF PCI address [1] > > > > This mean > > that device's sysfs net directory will hold multiple net devices. Ex.: > > > > $ ls '/sys/bus/pci/devices/0000:82:00.0/net' > > ens1f0 eth0 eth1 > > > > In switchdev mode the PF and the VF represntors support > phys_port_name > > In that case there is only a single netdev on the PCI device which is > > the PF. The other net devices are the VF representors which we want to > > exclude. whereas in the case of using phys_port_id, there could be > > multiple PFs, and so we have to match the phys_port_id of the VF > > > > virPCIGetNetName() will try to get PF name by it's index - 0. The > > problem here is that the PF nedev entry may not be the first. > > > > To fix that, for switch devices, we introduce a new logic to select > > the PF uplink netdev according to the content of phys_port_name. > > Extend > > virPCIGetNetName() logic work in following sequence: > > - filter by phys_port_id, if it's provided, (multi PF) > > or > > - filter by phys_port_name if exist (one PF) > > - get net device by it's index (position) in sysfs net directory. > > > > [1] - https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/ > > commit/?id=123f0f53dd64b67e34142485fe866a8a581f12f1 > > > > Co-Authored-by: Moshe Levi <moshele@xxxxxxxxxx> > > Signed-off-by: Dmytro Linkin <dlinkin@xxxxxxxxxx> > > Reviewed-by: Adrian Chiris <adrianc@xxxxxxxxxx> > > --- > > src/util/virnetdev.c | 7 +++---- > > src/util/virpci.c | 38 +++++++++++++++++++++++++++++++++++++- > > src/util/virpci.h | 5 +++++ > > 3 files changed, 45 insertions(+), 5 deletions(-) > > > > diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index > > d41b967d6a..2485718b48 100644 > > --- a/src/util/virnetdev.c > > +++ b/src/util/virnetdev.c > > @@ -1263,7 +1263,7 @@ virNetDevGetVirtualFunctions(const char > *pfname, > > } > > > > if (virPCIGetNetName(pci_sysfs_device_link, 0, > > - pfPhysPortID, NULL, &((*vfname)[i])) < 0) { > > + pfPhysPortID, &((*vfname)[i])) < 0) { > > > This change disappears once the incorrect chunks are removed from patch 1. > > > > goto cleanup; > > } > > > > @@ -1358,8 +1358,7 @@ virNetDevGetPhysicalFunction(const char > *ifname, char **pfname) > > return -1; > > > > if (virPCIGetNetName(physfn_sysfs_path, 0, > > - vfPhysPortID, > > - VIR_PF_PHYS_PORT_NAME_REGEX, pfname) < 0) { > > + vfPhysPortID, pfname) < 0) { > > > Likewise. > > > > return -1; > > } > > > > @@ -1422,7 +1421,7 @@ virNetDevPFGetVF(const char *pfname, int vf, > char **vfname) > > * isn't bound to a netdev driver, it won't have a netdev name, > > * and vfname will be NULL). > > */ > > - return virPCIGetNetName(virtfnSysfsPath, 0, pfPhysPortID, NULL, > vfname); > > + return virPCIGetNetName(virtfnSysfsPath, 0, pfPhysPortID, > > + vfname); > > > Likewise > > > > } > > > > > > diff --git a/src/util/virpci.c b/src/util/virpci.c index > > 50fd5ef7ea..e20fb9e10b 100644 > > --- a/src/util/virpci.c > > +++ b/src/util/virpci.c > > @@ -2469,7 +2469,7 @@ > virPCIDeviceAddressGetSysfsFile(virPCIDeviceAddressPtr addr, > > * @idx: used to choose which netdev when there are several > > > (side note: I was spelunking the code trying to figure out when this idx is > actually set and used, and it seems like the answer might be "never". > I should probably look into cleaning it up) > > > > * (ignored if physPortID is set) > > * @physPortID: match this string in the netdev's phys_port_id > > - * (or NULL to ignore and use idx instead) > > + * (or NULL to ignore and use phys_port_name or idx instead) > > * @netname: used to return the name of the netdev > > * (set to NULL (but returns success) if there is no netdev) > > * > > @@ -2483,6 +2483,7 @@ virPCIGetNetName(const char > *device_link_sysfs_path, > > { > > g_autofree char *pcidev_sysfs_net_path = NULL; > > g_autofree char *firstEntryName = NULL; > > + g_autofree char *thisPhysPortName = NULL; > > g_autoptr(DIR) dir = NULL; > > struct dirent *entry = NULL; > > size_t i = 0; > > @@ -2522,7 +2523,42 @@ virPCIGetNetName(const char > > *device_link_sysfs_path, > > > > continue; > > } > > + > > } else { > > + /* Most switch devices use phys_port_name instead of > > + * phys_port_id. > > + * NOTE: VFs' representors net devices can be linked to PF's PCI > > + * device, which mean that there'll be multiple net devices > > + * instances and to get a proper net device need to match on > > + * specific regex. > > + * To get PF netdev, for ex., used following regex: > > + * "(p[0-9]+$)|(p[0-9]+s[0-9]+$)" > > + * or to get exact VF's netdev next regex is used: > > + * "pf0vf1$" > > + */ > > + if (virNetDevGetPhysPortName(entry->d_name, > &thisPhysPortName) < 0) > > + return -1; > > + > > + if (thisPhysPortName) { > > + /* if this one doesn't match, keep looking */ > > + if (!virStringMatch(thisPhysPortName, > VIR_PF_PHYS_PORT_NAME_REGEX)) { > > + VIR_FREE(thisPhysPortName); > > > When I started using g_autofree and explicitly called VIR_FREE() on a > g_autofree pointer so I could re-use it, I was told this is a bad practice and to > avoid it. We can easily avoid it in this case by declaring thisPhysPortName > inside the else clause where it is used (just before the line above us ^^^ that > calls virNetDevGetPhysPortName()) - that way it will be autofreed each time > through the loop. > > > > + /* Save the first entry we find to use as a failsafe > > + * in case we fail to match on regex. > > + */ > > + if (!firstEntryName) > > + firstEntryName = g_strdup(entry->d_name); > > > All this stuff about setting firstEntryName is pointless in the else clause of "if > (physPortID)": we will either get to the bottom of the > while() loop and return without using firstEntryName, or we will continue > back to the top of the loop until we run out of entries, drop out of the loop, > and get to: > > > if (!physPortID) > > return 0; > > Since we've already established that physPortID == NULL, that's going to take > us out of this function before we can use firstEntryName. > > > > + > > + continue; > > + } > > + } else { > > + /* Save the first entry we find to use as a failsafe in case > > + * phys_port_name is not supported. > > + */ > > + if (!firstEntryName) > > + firstEntryName = g_strdup(entry->d_name); > > + continue; > > > The else clause here is what will be taken for any sort of hardware that has > only a single netdev listed in the "net" directory of the PCI device (so pretty > much everything except a new generation Mellanox card, right?). I would > have thought that in this case we should treat it as below (just increment i > and continue) > > > (BTW, if one of the netdevs in a PCI device's "net" directory has a > phys_port_name, is it *always* the case that all the other netdevs listed in > that directory also have a phys_port_name? (ie if the PF has a > phys_port_name, will the VF representors always have a phys_port_name, > and vice versa?) I'm assuming the answer is "yes". > > > > + } > > if (i++ < idx) > > continue; > > > After thinking about this a bit, I've come up with the attached diff (based on > your patches - you can just add it to your local directory with "git am -3 0001- > blah.patch"). Does that look right to you? If it does (and it works!) then I can > squash it in and push. If you're not quite happy with it, then you can send > another iteration. I promise I'll be much faster about responding now!! (be > sure to Cc me so I notice it though, and maybe even ping me in IRC.) > > > > > > } > > diff --git a/src/util/virpci.h b/src/util/virpci.h index > > 43828b0a8a..9e89ede1d5 100644 > > --- a/src/util/virpci.h > > +++ b/src/util/virpci.h > > @@ -55,6 +55,11 @@ struct _virZPCIDeviceAddress { > > > > #define VIR_PCI_DEVICE_ADDRESS_FMT "%04x:%02x:%02x.%d" > > > > +/* Represents format of PF's phys_port_name in switchdev mode: > > + * 'p%u' or 'p%us%u'. New line checked since value is readed from sysfs > file. > > + */ > > +# define VIR_PF_PHYS_PORT_NAME_REGEX "(p[0-9]+$)|(p[0-9]+s[0- > 9]+$)" > > > There should be no space between # and define here. I'm trying to > remember if it's the cppi package that finds that, or something else. > (that's also in my diff) > > > > + > > struct _virPCIDeviceAddress { > > unsigned int domain; > > unsigned int bus; >