On 3/4/25 1:43 PM, Guangguan Wang wrote: > When using smc_pnet in SMC, it will only search the pnetid in the > base_ndev of the netdev hierarchy(both HW PNETID and User-defined > sw pnetid). This may not work for some scenarios when using SMC in > container on cloud environment. > In container, there have choices of different container network, > such as directly using host network, virtual network IPVLAN, veth, > etc. Different choices of container network have different netdev > hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1 > in host below is the netdev directly related to the physical device). > _______________________________ > | _________________ | > | |POD | | > | | | | > | | eth0_________ | | > | |____| |__| | > | | | | > | | | | > | eth1|base_ndev| eth0_______ | > | | | | RDMA || > | host |_________| |_______|| > --------------------------------- > netdev hierarchy if directly using host network > ________________________________ > | _________________ | > | |POD __________ | | > | | |upper_ndev| | | > | |eth0|__________| | | > | |_______|_________| | > | |lower netdev | > | __|______ | > | eth1| | eth0_______ | > | |base_ndev| | RDMA || > | host |_________| |_______|| > --------------------------------- > netdev hierarchy if using IPVLAN > _______________________________ > | _____________________ | > | |POD _________ | | > | | |base_ndev|| | > | |eth0(veth)|_________|| | > | |____________|________| | > | |pairs | > | _______|_ | > | | | eth0_______ | > | veth|base_ndev| | RDMA || > | |_________| |_______|| > | _________ | > | eth1|base_ndev| | > | host |_________| | > --------------------------------- > netdev hierarchy if using veth > Due to some reasons, the eth1 in host is not RDMA attached netdevice, > pnetid is needed to map the eth1(in host) with RDMA device so that POD > can do SMC-R. Because the eth1(in host) is managed by CNI plugin(such > as Terway, network management plugin in container environment), and in > cloud environment the eth(in host) can dynamically be inserted by CNI > when POD create and dynamically be removed by CNI when POD destroy and > no POD related to the eth(in host) anymore. It is hard to config the > pnetid to the eth1(in host). But it is easy to config the pnetid to the > netdevice which can be seen in POD. When do SMC-R, both the container > directly using host network and the container using veth network can > successfully match the RDMA device, because the configured pnetid netdev > is a base_ndev. But the container using IPVLAN can not successfully > match the RDMA device and 0x03030000 fallback happens, because the > configured pnetid netdev is not a base_ndev. Additionally, if config > pnetid to the eth1(in host) also can not work for matching RDMA device > when using veth network and doing SMC-R in POD. > > To resolve the problems list above, this patch extends to search user > -defined sw pnetid in the clc handshake ndev when no pnetid can be found > in the base_ndev, and the base_ndev take precedence over ndev for backward > compatibility. This patch also can unify the pnetid setup of different > network choices list above in container(Config user-defined sw pnetid in > the netdevice can be seen in POD). > > Signed-off-by: Guangguan Wang <guangguan.wang@xxxxxxxxxxxxxxxxx> > --- > net/smc/smc_pnet.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c > index 716808f374a8..b391c2ef463f 100644 > --- a/net/smc/smc_pnet.c > +++ b/net/smc/smc_pnet.c > @@ -1079,14 +1079,16 @@ static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev, > struct smc_init_info *ini) > { > u8 ndev_pnetid[SMC_MAX_PNETID_LEN]; > + struct net_device *base_ndev; > struct net *net; > > - ndev = pnet_find_base_ndev(ndev); > + base_ndev = pnet_find_base_ndev(ndev); > net = dev_net(ndev); > - if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port, > + if (smc_pnetid_by_dev_port(base_ndev->dev.parent, base_ndev->dev_port, > ndev_pnetid) && > + smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid) && > smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) { > - smc_pnet_find_rdma_dev(ndev, ini); > + smc_pnet_find_rdma_dev(base_ndev, ini); > return; /* pnetid could not be determined */ > } > _smc_pnet_find_roce_by_pnetid(ndev_pnetid, ini, NULL, net); I understand Wenjia opposed to this solution as it may create invalid topologies ?!? https://lore.kernel.org/netdev/08cd6e15-3f8c-47a0-8490-103d59abf910@xxxxxxxxxxxxx/#t Wenjia, could you please confirm? Thanks, Paolo