On 05/11/15 11:31, Christoph Hellwig wrote:
On Mon, May 11, 2015 at 10:54:30AM +0200, Bart Van Assche wrote:
There are multiple events that can cause the SRP initiator driver to
initiate a reconnect:
1. The SCSI core invoking eh_host_reset_handler().
2. An error reported by the IB HCA or by the IB core, e.g. an RDMA
transmit timeout or a transport layer disconnect reported by the
IB/CM.
Right, I missed the srp_reconnect_work case. But even with that I
think what I wrote above still stands. srp_reconnect_work in that
case would just directly trigger the abort all commands and
reconnect operation.
The main point I was trying to make is that instead of having a sequence
of:
1) block new queuecommand instances
2) flush out pending queuecommand instances
3) do part of the disconnect
4) fail all in-flight commands
5) reconnect
we should aim for:
1) block new queuecommand instances
2) fail all in-flight commands
3) disconnect and reconnect
to avoid the need to keep track of pending queuecommand instances,
and instead re-use the existing infrastructure to fail all in-flight
commands, which we have the infrastructure for, and which we need
to do anyway.
Hello Christoph,
Your proposal absolutely makes sense to me but unfortunately I do not
have the time available now to implement it. Would it be acceptable if I
rework scsi_wait_for_queuecommand() such that per-CPU counters are
introduced in blk-mq instead of one counter per hctx ?
Thanks,
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html