On 17.10.23 05:49, Wen Gu wrote:
On 2023/10/8 15:19, Wen Gu wrote:
On 2023/10/5 16:21, Niklas Schnelle wrote:
Hi Wen Gu,
I've been trying out your series with iperf3, qperf, and uperf on
s390x. I'm using network namespaces with a ConnectX VF from the same
card in each namespace for the initial TCP/IP connection i.e. initially
it goes out to a real NIC even if that can switch internally. All of
these look great for streaming workloads both in terms of performance
and stability. With a Connect-Request-Response workload and uperf
however I've run into issues. The test configuration I use is as
follows:
Client Command:
# host=$ip_server ip netns exec client smc_run uperf -m tcp_crr.xml
Server Command:
# ip netns exec server smc_run uperf -s &> /dev/null
Uperf tcp_crr.xml:
<?xml version="1.0"?>
<profile name="TCP_CRR">
<group nthreads="12">
<transaction duration="120">
<flowop type="connect"
options="remotehost=$host protocol=tcp" />
<flowop type="write" options="size=200"/>
<flowop type="read" options="size=1000"/>
<flowop type="disconnect" />
</transaction>
</group>
</profile>
The workload first runs fine but then after about 4 GB of data
transferred fails with "Connection refused" and "Connection reset by
peer" errors. The failure is not permanent however and re-running
the streaming workloads run fine again (with both uperf server and
client restarted). So I suspect something gets stuck in either the
client or server sockets. The same workload runs fine with TCP/IP of
course.
Thanks,
Niklas
Hi Niklas,
Thank you very much for the test. With the test example you provided,
I've
reproduced the issue in my VM. And moreover, sometimes the test complains
with 'Error saying goodbye with <ip>'
I'll figure out what's going on here.
Thanks!
Wen Gu
I think that there is a common issue for SMC-R and SMC-D. I also reproduce
'connection reset by peer' and 'Error saying goodbye with <ip>' when using
SMC-R under the same test condition. They occur at the end of the test.
When the uperf test time ends, some signals are sent. At this point there
are usually some SMC connections doing CLC handshake. I catch some
-EINTR(-4)
in client and -ECONNRESET(-104) in server returned from smc_clc_wait_msg,
(correspondingly handshake error counts also increase) and TCP RST packets
sent to terminate the CLC TCP connection(clcsock).
I am not sure if this should be considered as a bydesign or a bug of SMC.
From an application perspective, the conn reset behavior only happens when
using SMC.
@Wenjia, could you please take a look at this?
Thanks,
Wen Gu
Hi Wen,
Do you mean the bug in smc_clc_wait_msg()?
If yes, I can not see any problem in the smc_clc_wait_msg(). From your
description, it looks to me like the server should get the CLC_PROPOSAL
message, but nothing in it while the client is waiting for the accept
CLC_ACCEPT message from the server until the wait loops is broken out.
Thanks,
Wenjia