Re: iWARP and soft-iWARP interop testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2019-05-07 at 13:37 +0000, Bernard Metzler wrote:
> Hi Doug,
> 
> many thanks for taking the time and effort...! 
> I'd love to have all that HW around here.
> 
> Some comments inline
> 
> Thanks very much!
> Bernard.
> 
> -----"Doug Ledford" <dledford@xxxxxxxxxx> wrote: -----
> 
> > To: "linux-rdma" <linux-rdma@xxxxxxxxxxxxxxx>
> > From: "Doug Ledford" <dledford@xxxxxxxxxx>
> > Date: 05/06/2019 10:38PM
> > Cc: "Gunthorpe, Jason" <jgg@xxxxxxxx>, "Bernard Metzler"
> > <BMT@xxxxxxxxxxxxxx>
> > Subject: iWARP and soft-iWARP interop testing
> > 
> > So, Jason and I were discussing the soft-iWARP driver submission, and
> > he
> > thought it would be good to know if it even works with the various
> > iWARP
> > hardware devices.  I happen to have most of them on hand in one form
> > or
> > another, so I set down to test it.  In the process, I ran across some
> > issues just with the hardware versions themselves, let alone with
> > soft-
> > iWARP.  So, here's the results of my matrix of tests.  These aren't
> > performance tests, just basic "does it work" smoke tests...
> > 
> > Hardware:
> > i40iw = Intel x722
> > qed1 = QLogic FastLinQ QL45000
> > qed2 = QLogic FastLinQ QL41000
> > cxgb4 = Chelsio T520-CR
> > 
> > 
> > 
> > Test 1:
> > rping -s -S 40 -C 20 -a $local
> > rping -c -S 40 -C 20 -I $local -a $remote
> > 
> >                    Server Side
> > 	i40iw		qed1		qed2		cxgb4		siw
> > i40iw	FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]
> > qed1	FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]
> > qed2	FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]
> > cxgb4	FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]		FAIL[1]
> > siw	FAIL[2]		FAIL[1]		FAIL[1]		FAIL[1]		Untested
> > 
> > Failure 1:
> > Client side shows:
> > client DISCONNECT EVENT...
> > Server side shows:
> > server DISCONNECT EVENT...
> > wait for RDMA_READ_ADV state 10
> > 
> 
> I see the same behavior between two siw instances. In my tests, these events
> are created only towards the end of the rping test. Using the -v flag on
> client and server, I see the right amount of data being exchanged before
> ('-C 2' to save space here):
> 
> Server:
> [bmt@rims ~]$ rping -s -S 40 -C 2 -a 10.0.0.1 -v
> server ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ
> server ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[
> server DISCONNECT EVENT...
> wait for RDMA_READ_ADV state 10
> [bmt@rims ~]$ 
> 
> Client:
> [bmt@spoke ~]$ rping -c -S 40 -C 2 -I 10.0.0.2 -a 10.0.0.1 -v
> ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ
> ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[
> client DISCONNECT EVENT...
> [bmt@spoke ~]$ 
> 
> I am not sure if that DISCONNECT EVENT thing is a failure
> though, or just the regular end of the test.

You and Steve are right.  I somehow dropped the -v from my test runs and
mistook this test ending for a failure.  It's probably worth fixing in
rping as it really is confusing ;-)

> 
> > Failure 2:
> > Client side shows:
> > cma event RDMA_CM_EVENT_REJECTED, error -104
> > wait for CONNECTED state 4
> > connect error -1
> > Server side show:
> > Nothing, server didn't indicate anything had happened
> > 
> 
> Looks like an MPA reject - maybe this is due to i40iw being
> unwilling to switch of CRC, if it is not turned on at siw
> side? Or any other MPA reject reason...for fixing, we could
> take that issue into a private conversation thread. Turning
> on siw debugging will likely show the reason.

I'm not too worried about it.  If it's just a mismatch of options, then
it might be worth figuring out what options the i40iw doesn't like being
on/off and documenting it though.

> 
> > Test 2:
> > ib_read_bw -d $device -R
> > ib_read_bw -d $device -R $remote
> > 
> >                    Server Side
> > 	i40iw		qed1		qed2		cxgb4		siw
> > i40iw	PASS		PASS		PASS		PASS		PASS
> > qed1	PASS		PASS		PASS		PASS		PASS[1]
> > qed2	PASS		PASS		PASS		PASS		PASS[1]
> > cxgb4	PASS		PASS		PASS		PASS		PASS
> > siw	FAIL[1]		PASS		PASS		PASS		untested
> > 
> > Pass 1:
> > These tests passed, but show pretty much worst case performance
> > behavior.  While I got 600MB/sec on one test, and 175MB/sec on
> > another,
> > the two that I marked were only at the 1 or 2MB/sec level.  I thought
> > they has hung initially.
> 
> Compared to plain TCP, performance suffers from not using
> segmentation offloading.  Also, always good to set MTU size to max. 
> With all that, on a 40Gbs link, line speed should be possible.
> But, yes, performance is not our main focus currently.

I wasn't really testing for performance here.  And I couldn't really
anyway.  I had a mix of 10Gig, 40Gig, and 25Gig hardware in this matrix.
It wouldn't be even close to an apples to apples comparison.  But, those
two tests, which only effect the qed driver, showed a clearly
pathological behavior.  It would be worth debugging in the future.

-- 
Doug Ledford <dledford@xxxxxxxxxx>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux