On Wed, Oct 7, 2015 at 12:26 AM, Doug Ledford <dledford@xxxxxxxxxx> wrote: > Nothing so simple unfortunately. And it isn't an IB/RoCE cluster, it's > IB/IB/OPA/RoCE/IWARP cluster. Regardless though, that's not my problem > and what I'm chasing. To be precise no two transports out of IB/RoCE/iWARP/OPA are inter-operable, so these are "just" different cards/transports under the same IB core on this cluster. > Yes, I know how to do DOA testing. So what's dead in your env after (say) 59m of examination? >> What we do know that needs fixing for 4.3-rc >> --> RoCE, you need the patch re-posted by Haggai few hours ago >> "IB/cma: Accept connection without a valid netdev on RoCE" -- without >> it, RoCE isn't working. > I have that already. It's available on both github and k.o and just > waiting for a pull request. Maybe wait to get the fixes for the non-default pkey on mlx5 (see more below)? Did you actually note that before Haggai posted the patch?! once I realized how deep was the breakage, I became sort of very worried re your testing env not shouting hard on us this something is broken even before 4.3-rc1 >> --> **mlx5** devices and no-default IB pkeys, Haggai and Co are >> working on a fix since this isn't working since 4.3-rc1. I told them >> we need it till rc5.5 (i.e few days before rc6 and if not, will have >> to revert some 4.3-rc1 bits. > I already have on patch related to this in my repo as well. The 0day > testing just came back and it's all good. I suspect that you don't... do you have rping up and running between mlx4 and mlx5 on non default pkey? the breakage is a bit tricky and you might not see it if you run mlx5 against mlx5, BTW which patch is that? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html