On Tue, Jun 20, 2017 at 1:58 AM, Sagi Grimberg <sagi@xxxxxxxxxxx> wrote: > >>> Hi Robert, >>> >>>> I ran into this with 4.9.32 when I rebooted the target. I tested >>>> 4.12-rc6 and this particular error seems to have been resolved, but I >>>> now get a new one on the initiator. This one doesn't seem as >>>> impactful. >>>> >>>> [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe >>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 >>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 >>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 >>>> [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2 >>> >>> >>> Max, Leon, >>> >>> Care to parse this syndrome for us? ;) >> >> >> Here the parsed output, it says that it was access to mkey which is >> free. >> >> ======== cqe_with_error ======== >> wqe_id : 0x0 >> srqn_usr_index : 0x0 >> byte_cnt : 0x0 >> hw_error_syndrome : 0x93 >> hw_syndrome_type : 0x0 >> vendor_error_syndrome : 0x52 > > > Can you share the check that correlates to the vendor+hw syndrome? > >> syndrome : LOCAL_PROTECTION_ERROR (0x4) >> s_wqe_opcode : SEND (0xa) > > > That's interesting, the opcode is a send operation. I'm assuming > that this is immediate-data write? Robert, did this happen when > you issued >4k writes to the target? I was running dd with oflag=direct, so yes. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html