Hi Bart, I've looked at this and didn't really like the unconditional hctx lock in the blk-mq path which might have nasty effects when just using a single hctx. So I'm taking another step back and try to understand what you're doign here. Let me try to recreate the issue: - we get a ->host_reset call for the SRP initiator, which then calls srp_reconnect_rport, at which point we still have outstanding commands on the wire, and we still allow concurrent I/O submission - srp_reconnect_rport then blocks new I/O, and tries to drain the peding requeuest from ->queuecommand. It then calls into srp_rport_reconnect, which after some work also clears out all commands on the wire and the reconnects Maybe it's time to move to what Hannes suggested in events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf slides 56+ at least for SRP as a start, that is: - once escalating to a LUN reset fail all commands for the LUN and block the the LUN for I/O and send a TMF abort - once scalatating to the host reset fail all I/O for the host and block the host (all LUNs) for I/O, and only then call the host reset action (reconnect in the SRP case) (or rather replace the current RP host reset with the I_T Nexus reset suggested by Hannes) The advantage is that we can do the full drain much more easily than just waiting for command leaving ->queuecommnd. The other advantage is that we can implement this with fairly small changes in the scsi_error.c code trggered off a host or transport template flag, without touching code in the block layer while at the same time significantly simplifying the transport layer and drivers. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html