Sagi, Thanks for the update. On Thu, Jun 29, 2017 at 8:32 AM, Sagi Grimberg <sagi@xxxxxxxxxxx> wrote: > Hey Robert, > >> Could something like this be causing the D state problem I was seeing >> in iSER almost a year ago? > > > No, that is a bug in the mlx5 device as far as I'm concerned (although I > couldn't prove it). I've tried to track it down but without access to > the FW tools I can't understand what is going on. I've seen this same > phenomenon with nvmet-rdma before as well. Do you know who I could contact about it? I can reproduce the problem pretty easy with two hosts back to back, so it should be easy for someone with mlx5 Eth devices to replicate. > It looks like when we perform QP draining in the presence of rdma > operations it may not complete, meaning that the zero-length rdma write > never generates a completion. Maybe it has something to do with the qp > moving to error state when some rdma operations have not completed. > >> I tried writing a patch for iSER based on >> this, but it didn't help. Either the bug is not being triggered in >> device removal, > > > It's 100% not related to device removal. > >> or I didn't line up the statuses correctly. But it >> seems that things are getting stuck in the work queue and some sort of >> deadlock is happening so I was hopeful that something similar may be >> in iSER. > > > The hang is the ULP code waiting for QP drain. Yeah, the patches I wrote did nothing to help the problem. The only thing that kind of worked, was forcing the queue to drop (maybe I was just ignoring the old queue, I can't remember exactly), but it was leaving some stale iSCSI session info around. Now that I've read more of the iSCSI code, I wonder if I should revisit that. I think Bart said that the sledgehammer approach I took should not be necessary. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html