On 2024/1/24 22:29, Alexandra Winter wrote:
Hello Wen Gu, our colleague Matthew reported that SMC-D is failing in certain scenarios on kernel v6.8 (thx Matt!). He bisected it to b40584d ("net/smc: compatible with 128-bits extended GID of virtual ISM device") I think the root cause could also be somewhere else in the SMC-Dv2.1 patchset. I was able to reproduce the issue on a 6.8.0-rc1 kernel. I tested iperf over smc-d with: smc_run iperf3 -s smc_run iperf3 -c <IP@> 1) Doing an iperf in a single system using 127.0.0.1 as IP@ (System A=iperf client=iperf server) 2) Doing iperf to a remote system (System A=client; System B=iperf server) The second iperf fails with an error message like: "iperf3: error - unable to receive cookie at server: Bad file descriptor" on the server" If I do first 2) (iperf to remote) and then 1) (iperf to local), then the iperf to local fails. I can do multiple iperf to the first server without problems. I ran it on a debug server with KASAN, but got no reports in the Logfile. I will try to debug further, but wanted to let you all know. Kind regards Alexandra Reported-by: Matthew Rosato <mjrosato@xxxxxxxxxxxxx>
Hi Alexandra and Matthew, Thank you very much for detailed description. I tried to reproduce this with loopback-ism, cut some checks so that the remote-system handshake can be done. After a while debug I found an elementary mistake of mine in b40584d ("net/smc: compatible with 128-bits extended GID of virtual ISM device").. The operator order in smcd_lgr_match() is not as expected. It will always return 'true' in remote-system case. static bool smcd_lgr_match(struct smc_link_group *lgr, - struct smcd_dev *smcismdev, u64 peer_gid) + struct smcd_dev *smcismdev, + struct smcd_gid *peer_gid) { - return lgr->peer_gid == peer_gid && lgr->smcd == smcismdev; + return lgr->peer_gid.gid == peer_gid->gid && lgr->smcd == smcismdev && + smc_ism_is_virtual(smcismdev) ? + (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1; } Could you please try again with this patch? to see if this is the root cause. Really sorry for the inconvenience. diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index da6a8d9c81ea..c6a6ba56c9e3 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1896,8 +1896,8 @@ static bool smcd_lgr_match(struct smc_link_group *lgr, struct smcd_gid *peer_gid) { return lgr->peer_gid.gid == peer_gid->gid && lgr->smcd == smcismdev && - smc_ism_is_virtual(smcismdev) ? - (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1; + (smc_ism_is_virtual(smcismdev) ? + (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1); } Thanks, Wen Gu