Hi Robert,
I ran into this with 4.9.32 when I rebooted the target. I tested
4.12-rc6 and this particular error seems to have been resolved, but I
now get a new one on the initiator. This one doesn't seem as
impactful.
[Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe
[Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
[Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
[Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
[Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2
Max, Leon,
Care to parse this syndrome for us? ;)
Here the parsed output, it says that it was access to mkey which is
free.
======== cqe_with_error ========
wqe_id : 0x0
srqn_usr_index : 0x0
byte_cnt : 0x0
hw_error_syndrome : 0x93
hw_syndrome_type : 0x0
vendor_error_syndrome : 0x52
Can you share the check that correlates to the vendor+hw syndrome?
syndrome : LOCAL_PROTECTION_ERROR (0x4)
s_wqe_opcode : SEND (0xa)
That's interesting, the opcode is a send operation. I'm assuming
that this is immediate-data write? Robert, did this happen when
you issued >4k writes to the target?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html