Here the parsed output, it says that it was access to mkey which is
free.
Missed that :)
======== cqe_with_error ========
wqe_id : 0x0
srqn_usr_index : 0x0
byte_cnt : 0x0
hw_error_syndrome : 0x93
hw_syndrome_type : 0x0
vendor_error_syndrome : 0x52
Can you share the check that correlates to the vendor+hw syndrome?
mkey.free == 1
Hmm, the way I understand it is that the HW is trying to access
(locally via send) a MR which was already invalidated.
Thinking of this further, this can happen in a case where the target
already completed the transaction, sent SEND_WITH_INVALIDATE but the
original send ack was lost somewhere causing the device to retransmit
from the MR (which was already invalidated). This is highly unlikely
though.
Shouldn't this be protected somehow by the device?
Can someone explain why the above cannot happen? Jason? Liran? Anyone?
Say host register MR (a) and send (1) from that MR to a target,
send (1) ack got lost, and the target issues SEND_WITH_INVALIDATE
on MR (a) and the host HCA process it, then host HCA timeout on send (1)
so it retries, but ehh, its already invalidated.
Or, we can also have a race where we destroy all our MRs when I/O
is still running (but from the code we should be safe here).
Robert, when you rebooted the target, I assume iscsi ping
timeout expired and the connection teardown started correct?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html