Re: iWARP and soft-iWARP interop testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Doug,

many thanks for taking the time and effort...! 
I'd love to have all that HW around here.

Some comments inline

Thanks very much!
Bernard.

-----"Doug Ledford" <dledford@xxxxxxxxxx> wrote: -----

>To: "linux-rdma" <linux-rdma@xxxxxxxxxxxxxxx>
>From: "Doug Ledford" <dledford@xxxxxxxxxx>
>Date: 05/06/2019 10:38PM
>Cc: "Gunthorpe, Jason" <jgg@xxxxxxxx>, "Bernard Metzler"
><BMT@xxxxxxxxxxxxxx>
>Subject: iWARP and soft-iWARP interop testing
>
>So, Jason and I were discussing the soft-iWARP driver submission, and
>he
>thought it would be good to know if it even works with the various
>iWARP
>hardware devices.  I happen to have most of them on hand in one form
>or
>another, so I set down to test it.  In the process, I ran across some
>issues just with the hardware versions themselves, let alone with
>soft-
>iWARP.  So, here's the results of my matrix of tests.  These aren't
>performance tests, just basic "does it work" smoke tests...
>
>Hardware:
>i40iw = Intel x722
>qed1 = QLogic FastLinQ QL45000
>qed2 = QLogic FastLinQ QL41000
>cxgb4 = Chelsio T520-CR
>
>
>
>Test 1:
>rping -s -S 40 -C 20 -a $local
>rping -c -S 40 -C 20 -I $local -a $remote
>
>                    Server Side
>	i40iw		qed1		qed2		cxgb4		siw
>i40iw	FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]
>qed1	FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]
>qed2	FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]
>cxgb4	FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]
>siw	FAIL[2]		FAIL[1]		FAIL[1]		FAIL[1]		Untested
>
>Failure 1:
>Client side shows:
>client DISCONNECT EVENT...
>Server side shows:
>server DISCONNECT EVENT...
>wait for RDMA_READ_ADV state 10
>

I see the same behavior between two siw instances. In my tests, these events
are created only towards the end of the rping test. Using the -v flag on
client and server, I see the right amount of data being exchanged before
('-C 2' to save space here):

Server:
[bmt@rims ~]$ rping -s -S 40 -C 2 -a 10.0.0.1 -v
server ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ
server ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[
server DISCONNECT EVENT...
wait for RDMA_READ_ADV state 10
[bmt@rims ~]$ 

Client:
[bmt@spoke ~]$ rping -c -S 40 -C 2 -I 10.0.0.2 -a 10.0.0.1 -v
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[
client DISCONNECT EVENT...
[bmt@spoke ~]$ 

I am not sure if that DISCONNECT EVENT thing is a failure
though, or just the regular end of the test.


>Failure 2:
>Client side shows:
>cma event RDMA_CM_EVENT_REJECTED, error -104
>wait for CONNECTED state 4
>connect error -1
>Server side show:
>Nothing, server didn't indicate anything had happened
>

Looks like an MPA reject - maybe this is due to i40iw being
unwilling to switch of CRC, if it is not turned on at siw
side? Or any other MPA reject reason...for fixing, we could
take that issue into a private conversation thread. Turning
on siw debugging will likely show the reason.


>Obviously, rping appears to be busted on iWARP (which surprises me to
>be
>honest...it's part of the rdmacm-utils and should be using the rdmacm
>connection manager, which is what's required to work on iWARP, but
>maybe
>it just has some simple bug that needs fixed).
>

As said above, its maybe just the awkward end of the test run,
which should not display the DISCONNECT event.

>Test 2:
>ib_read_bw -d $device -R
>ib_read_bw -d $device -R $remote
>
>                    Server Side
>	i40iw		qed1		qed2		cxgb4		siw
>i40iw	PASS		PASS		PASS		PASS		PASS
>qed1	PASS		PASS		PASS		PASS		PASS[1]
>qed2	PASS		PASS		PASS		PASS		PASS[1]
>cxgb4	PASS		PASS		PASS		PASS		PASS
>siw	FAIL[1]		PASS		PASS		PASS		untested
>
>Pass 1:
>These tests passed, but show pretty much worst case performance
>behavior.  While I got 600MB/sec on one test, and 175MB/sec on
>another,
>the two that I marked were only at the 1 or 2MB/sec level.  I thought
>they has hung initially.

Compared to plain TCP, performance suffers from not using
segmentation offloading.  Also, always good to set MTU size to max. 
With all that, on a 40Gbs link, line speed should be possible.
But, yes, performance is not our main focus currently.

>
>Test 3:
>qperf
>qperf -cm1 -v $remote rc_bw
>
>                    Server Side
>	i40iw		qed1		qed2		cxgb4		siw
>i40iw	PASS[1]		PASS		PASS[1]		PASS[1]		PASS
>qed1	PASS[1]		PASS		PASS[1]		PASS[1]		PASS
>qed2	PASS[1]		PASS		PASS[1]		PASS		PASS
>cxgb4	FAIL[2]		FAIL[2]		FAIL[2]		FAIL[2]		FAIL[2]
>siw	FAIL[3]		PASS		PASS		PASS		untested
>
>Pass 1:
>These passed, but only with some help.  After each client ran, the
>qperf
>server had to be restarted or else the client would show this error:
>rc_bw:
>failed to receive RDMA CM TCP IPv4 server port: timed out
>
>Fail 2:
>Whenever cxgb4 was the client side, the test would appear to run
>(including there was a delay appropriate for the test to run), but
>when
>it should have printed out results, it printed this instead:
>rc_bw:
>rdma_disconnect failed: Invalid argument
>
>Fail 3:
>Server side showed no output
>Client side showed:
>rc_bw:
>rdma_connect failed
>
>So, there you go, it looks like siw actually does a reasonable job of
>working, at least at the functionality level, and in some cases even
>has
>modestly decent performance.  I'm impressed.  Well done Bernard.
>
>-- 
>Doug Ledford <dledford@xxxxxxxxxx>
>    GPG KeyID: B826A3330E572FDD
>    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57
>2FDD
>
[attachment "signature.asc" removed by Bernard Metzler/Zurich/IBM]




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux