Re: [PATCH net] net/smc: use the correct ndev to find pnetid by pnetid table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2025/2/10 21:52, Halil Pasic wrote:
> On Fri, 10 Jan 2025 13:43:44 +0800
> Guangguan Wang <guangguan.wang@xxxxxxxxxxxxxxxxx> wrote:
> 
>> We want to use SMC in container on cloud environment, and encounter problem
>> when using smc_pnet with commit 890a2cb4a966. In container, there have choices
>> of different container network, such as directly using host network, virtual
>> network IPVLAN, veth, etc. Different choices of container network have different
>> netdev hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1 in host
>> below is the netdev directly related to the physical device).
>>  _______________________________      ________________________________   
>> |   _________________           |     |   _________________           |  
>> |  |POD              |          |     |  |POD  __________  |          |  
>> |  |                 |          |     |  |    |upper_ndev| |          |  
>> |  | eth0_________   |          |     |  |eth0|__________| |          |  
>> |  |____|         |__|          |     |  |_______|_________|          |  
>> |       |         |             |     |          |lower netdev        |  
>> |       |         |             |     |        __|______              |  
>> |   eth1|base_ndev| eth0_______ |     |   eth1|         | eth0_______ |  
>> |       |         |    | RDMA  ||     |       |base_ndev|    | RDMA  ||  
>> | host  |_________|    |_______||     | host  |_________|    |_______||  
>> ———————————————————————————————-      ———————————————————————————————-    
>>  netdev hierarchy if directly          netdev hierarchy if using IPVLAN    
>>    using host network
>>  _______________________________
>> |   _____________________       |
>> |  |POD        _________ |      |
>> |  |          |base_ndev||      |
>> |  |eth0(veth)|_________||      |
>> |  |____________|________|      |
>> |               |pairs          |
>> |        _______|_              |
>> |       |         | eth0_______ |
>> |   veth|base_ndev|    | RDMA  ||
>> |       |_________|    |_______||
>> |        _________              |
>> |   eth1|base_ndev|             |
>> | host  |_________|             |
>>  ———————————————————————————————
>>   netdev hierarchy if using veth
>>
>> Due to some reasons, the eth1 in host is not RDMA attached netdevice, pnetid
>> is needed to map the eth1(in host) with RDMA device so that POD can do SMC-R.
>> Because the eth1(in host) is managed by CNI plugin(such as Terway, network
>> management plugin in container environment), and in cloud environment the
>> eth(in host) can dynamically be inserted by CNI when POD create and dynamically
>> be removed by CNI when POD destroy and no POD related to the eth(in host)
>> anymore.
> 
> I'm pretty clueless when it comes to the details of CNI but I think
> I'm barely able to follow. Nevertheless if you have the feeling that
> my extrapolations are wrong, please do point it out.
> 
>> It is hard for us to config the pnetid to the eth1(in host). So we
>> config the pnetid to the netdevice which can be seen in POD.
> 
> Hm, this sounds like you could set PNETID on eth1 (in host) for each of
> the cases and everything would be cool (and would work), but because CNI
> and the environment do not support it, or supports it in a very
> inconvenient way, you are looking for a workaround where PNETID is set
> in the POD. Is that right? Or did I get something wrong?

Right.

> 
>> When do SMC-R, both
>> the container directly using host network and the container using veth network
>> can successfully match the RDMA device, because the configured pnetid netdev is a
>> base_ndev. But the container using IPVLAN can not successfully match the RDMA
>> device and 0x03030000 fallback happens, because the configured pnetid netdev is
>> not a base_ndev. Additionally, if config pnetid to the eth1(in host) also can not
>> work for matching RDMA device when using veth network and doing SMC-R in POD.
> 
> That I guess answers my question from the first paragraph. Setting
> PNETID on eth1 (host) would not be sufficient for veth. Right?

Right. It is also one of the reasons for setting PNETID in POD.

> 
> Another silly question: is making the PNETID basically a part of the Pod
> definition shifting PNETID from the realm of infrastructure (i.e.
> configured by the cloud provider) to the ream of an application (i.e.
> configured by the tenant)?

No, application do not need to know the PNETID configuration. We have a plugin in
Kubernetes. When deploying a POD, the plugin will automatically add an initContainer
to the POD and automatically configure the PNETID in initContainer.

> 
> AFAIU veth (host) is bridged (or similar) to eth1 (host) and that is in
> the host, and this is where we make sure that the requirements for SMC-R
> are satisfied.
> 
> But veth (host) could be attached to eth3 which is on a network not
> reachable via eth0 (host) or eth1 (host). In that case the pod could
> still set PNETID on veth (POD). Or?
> 

Sorry, I forget to add a precondition, it is a single-tenant scenario, and all of the
ethX in host are in the same VPC(A term in Cloud, can be simply understood as a private
network domain). The ethX in the same VPC means they have the same network reachability.
Therefore, in this scenario, we will not encounter the situation you mentioned.

>>
>> My patch can resolve the problem we encountered and also can unify the pnetid setup
>> of different network choices list above, assuming the pnetid is not limited to
>> config to the base_ndev directly related to the physical device(indeed, the current
>> implementation has not limited it yet).
> 
> I see some problems here, but I'm afraid we see different problems. For
> me not being able to set eth0 (veth/POD)'s PNEDID from the host is a
> problem. Please notice that with the current implementation users can
> only control the PNETID if infrastructure does not do so in the first
> place.
> 
> 
> Can you please help me reason about this? I'm unfortunately lacking
> Kubernetes skills here, and it is difficult for me to think along.

Yes, it is also a problem that not being able to set eth0 (veth/POD)'s PNEDID from the host.
Even if the eth1(host) have hardware PNETID, the eth0 (veth/POD) can not search the hardware
PNETID. Because the eth0 (veth/POD) and eth1(host) are not in one netdev hierarchy.
But the two netdev hierarchies have relationship. Maybe search PNETID in all related netdev
hierarchies can help resolve this. For example when finding the base_ndev, if the base_ndev
is a netdev has relationship with other netdev(veth .etc) then jump to the related netdev
hierarchy through the relationship to iteratively find the base_ndev.
It is an idea now. I have not do any research about it yet and I am not sure if it is feasible.

Thanks,
Guangguan Wang

> 
> Regards,
> Halil





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux