On 10/06/2015 04:54 PM, Or Gerlitz wrote: > On Tue, Oct 6, 2015 at 7:05 PM, Doug Ledford <dledford@xxxxxxxxxx> wrote: >> I'll have some sort of answer for that soon. I spent the better part of >> last week, and what time I worked on the weekend, plus all day yesterday >> on the internal infrastructure here at Red Hat. We're experiencing some >> growing pains in our cluster and some downtime as a result that keeps me >> from being able to test code effectively. I wouldn't be surprised if it >> takes another day or two to get it completely sorted out (or sorted as >> best I can, some things are out of my control). Then I have to see if >> any of the currently posted fixes for 4.3rc that I haven't grabbed yet >> resolve the iSER issue I'm seeing, then I'll move on to for-next processing. > > Doug, > > From my experience with VPI (IB/RoCE) clusters, librdmacm/rping is the > answer... namely -- if you have **rping** up and running over kernel X > for both IB and RoCE, things aren't in such a bad state. If you want > to go deeper, have it working over IB non-default partition and > Ethernet VLAN. Nothing so simple unfortunately. And it isn't an IB/RoCE cluster, it's IB/IB/OPA/RoCE/IWARP cluster. Regardless though, that's not my problem and what I'm chasing. > Also, for IB multicast, mckey with IPoIB port space, iperf multicast > over IPoIB would tell you how things are. > > So all to all, sans SRIOV, it should take you whole 20m to figure out > if something is really DOA over IB/RoCE HW and I believe iWARP too > (rping) - makes sense? Yes, I know how to do DOA testing. > What we do know that needs fixing for 4.3-rc > > --> RoCE, you need the patch re-posted by Haggai few hours ago > "IB/cma: Accept connection without a valid netdev on RoCE" -- without > it, RoCE isn't working. I have that already. It's available on both github and k.o and just waiting for a pull request. > --> **mlx5** devices and no-default IB pkeys, Haggai and Co are > working on a fix since this isn't working since 4.3-rc1. I told them > we need it till rc5.5 (i.e few days before rc6 and if not, will have > to revert some 4.3-rc1 bits. I already have on patch related to this in my repo as well. The 0day testing just came back and it's all good. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: 0E572FDD
Attachment:
signature.asc
Description: OpenPGP digital signature