On 2025/1/8 03:32, Halil Pasic wrote: > On Tue, 7 Jan 2025 09:44:30 +0100 > Paolo Abeni <pabeni@xxxxxxxxxx> wrote: > >> On 12/27/24 5:04 AM, Guangguan Wang wrote: > > @Guangguan Wang: please use my linux.ibm.com address > in the future. Get it. > >>> The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid> >>> to the pnetid table and will attach the <pnetid> to net device >>> whose name is <ethx>. But When do SMCR by <ethx>, in function >>> smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's >>> pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric >>> use of the pnetid seems weird. Sometimes it is difficult to know >>> the hierarchy of net device what may make it difficult to configure >>> the pnetid and to use the pnetid. Looking into the history of >>> commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table") >>> that changes the ndev from the <ethx> to the <ethx>'s base ndev >>> when finding pnetid by pnetid table. It seems a mistake. >>> >>> This patch changes the ndev back to the <ethx> when finding pnetid >>> by pnetid table. >>> >>> Fixes: 890a2cb4a966 ("net/smc: rework pnet table") >>> Signed-off-by: Guangguan Wang <guangguan.wang@xxxxxxxxxxxxxxxxx> >> >> If I read correctly, this will break existing applications using the >> lookup schema introduced by the blamed commit - which is not very >> recent. > > Hi Paolo, > > sorry for chiming in late. Wenjia is on vacation and Jan is out sick! > After some reading and thinking I could not figure out how 890a2cb4a966 > ("net/smc: rework pnet table") is broken. Before commit 890a2cb4a966: smc_pnet_find_roce_resource smc_pnet_find_roce_by_pnetid(ndev, ...) /* lookup via hardware-defined pnetid */ smc_pnetid_by_dev_port(base_ndev, ...) smc_pnet_find_roce_by_table(ndev, ...) /* lookup via SMC PNET table */ { ... list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) { if (ndev == pnetelem->ndev) { /* notice here, it was ndev to matching pnetid element in pnet table */ ... } After commit 890a2cb4a966: smc_pnet_find_roce_resource smc_pnet_find_roce_by_pnetid { ... base_ndev = pnet_find_base_ndev(ndev); /* rename the variable name to base_ndev for better understanding */ smc_pnetid_by_dev_port(base_ndev, ...) smc_pnet_find_ndev_pnetid_by_table(base_ndev, ...) { ... list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) { if (base_ndev == pnetelem->ndev) { /* notice here, it is base_ndev to matching pnetid element in pnet table */ ... } } The commit 890a2cb4a966 has changed ndev to base_ndev when matching pnetid element in pnet table. But in the function smc_pnet_add_eth, the pnetid is attached to the ndev itself, not the base_ndev. smc_pnet_add_eth(...) { ... ndev = dev_get_by_name(net, eth_name); ... if (new_netdev) { if (ndev) { new_pe->ndev = ndev; netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_ATOMIC); } list_add_tail(&new_pe->list, &pnettable->pnetlist); mutex_unlock(&pnettable->lock); } else { ... } > > Admittedly I'm not really a net guy,and I'm mostly guessing what that > lower and upper device stuff is, so please bear with me. All that said, I > do think that going to the lowest netdev in the hierarchy is a sane > thing to do here. I assume that lower and upper devices are applicable > to stuff like bonding. > > PNETID stands for "Physical Network Identifier" and the idea is that iff > two ports are connected to the same physical network then they should > have the same PNETID. And on s390 PNETID can come and often is comming > "from the hardware". Now for something like a bond of two OSA > interfaces, I would expect the two legs of the bond to probably have a > "HW PNETID", but the netdev representing the bond itself won't have one > unless the Linux admin defines a software PNETID, which is work, and > can't have a HW PNETID because it is a software construct within Linux. > Breaking for example an active-backup bond setup where the legs have > HW PNETIDs and the admin did not bother to specify a PNETID for the bond > is not acceptable. > > Let me also note that if ndev is a leaf (i.e. there is no lower device to > it) then ndev == base_ndev, and the whole discussion does not matter for > that case. > > Again I have to emphasize that my domain knowledge is very limited, but > I really don't feel comfortable going forward with this without Jan or > Wenjia weighing in on the matter. > > Paolo thanks for bringing this up! > >> >> Perhaps for a net patch would be better to support both lookup schemas >> i.e. >> >> (smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) || >> smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid)) >> >> ? >> > > Hm, I guess the idea here is that if ndev has a PNETID then it should > take precedence, but if not we should try to obtain the PNETID of its > "base_ndev". I'm not sure this would make things better compared to the > original idea of caring about the leaf. Which makes me question my > understanding of the problem statement from the commit message. > > BTW to implement the logic proposed by you Paolo, as understood by me, > we would have to use "&&" instead of "||". The whole expression is > supposed > to evaluate to false if a pnetid is found and to true if no pnet_id is > found. smc_pnet_find_ndev_pnetid_by_table(ndev) returns false if a pnetid > is found. I.e. if not found we would just short circuit to true and not > call smc_pnet_find_ndev_pnetid_by_table(base_ndev), which is not what I > believe you wanted to propose. Yes, it should be (smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) && smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid)) if for the consideration of application's compatible usage of smc_pnet. Thanks, Guangguan Wang > > To sum it up, please let us wait until Wenjia or Jan chime in. Copying > Alexandra as well: she is more of a net person than I am, and maybe she > has a more informed opinion. > > Regards, > Halil > [...] >