On 02/01/2018 09:56 AM, Logan Gunthorpe wrote: > Hello, > > We've experienced a regression with using nvme-of and two Connect-X5s. With v4.15 and v4.14.16 we see the following dmesgs when trying to connect to the target: > >> [ 43.732539] nvme nvme2: creating 16 I/O queues. >> [ 44.072427] nvmet: adding queue 1 to ctrl 1. >> [ 44.072553] nvmet: adding queue 2 to ctrl 1. >> [ 44.072597] nvme nvme2: Connect command failed, error wo/DNR bit: -16402 >> [ 44.072609] nvme nvme2: failed to connect queue: 3 ret=-18 >> [ 44.075421] nvmet_rdma: freeing queue 2 >> [ 44.075792] nvmet_rdma: freeing queue 1 >> [ 44.264293] nvmet_rdma: freeing queue 3 >> *snip* > > (on v4.15 there is additional error panics likely do to some other nvme-of error handling bugs) > > And nvme connect returns: > >> Failed to write to /dev/nvme-fabrics: Invalid cross-device link > > The two adapters are the same with the latest available firmware: > >> transport: InfiniBand (0) >> fw_ver: 16.21.2010 >> vendor_id: 0x02c9 >> vendor_part_id: 4119 >> hw_ver: 0x0 >> board_id: MT_0000000010 > > We bisected to find the commit that broke our setup is: > > 05e0cc84e00c net/mlx5: Fix get vector affinity helper function I doubt that the issue is within this fix itself, but with this fix the Automatic affinity settings for nvme over rdma is enabled, Maybe a bug was hiding there and we just stepped on it. Added Sagi, maybe he can help us spot the issue here. Thanks, saeed. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html