Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()

Bart Van Assche <bart.vanassche@xxxxxxxxxxx> · Mon, 11 May 2015 11:58:59 +0200

On 05/11/15 11:31, Christoph Hellwig wrote:
On Mon, May 11, 2015 at 10:54:30AM +0200, Bart Van Assche wrote:
There are multiple events that can cause the SRP initiator driver to
initiate a reconnect:
1. The SCSI core invoking eh_host_reset_handler().
2. An error reported by the IB HCA or by the IB core, e.g. an RDMA
    transmit timeout or a transport layer disconnect reported by the
    IB/CM.

Right, I missed the srp_reconnect_work case.  But even with that I
think what I wrote above still stands.  srp_reconnect_work in that
case would just directly trigger the abort all commands and
reconnect operation.

The main point I was trying to make is that instead of having a sequence
of:

  1) block new queuecommand instances
  2) flush out pending queuecommand instances
  3) do part of the disconnect
  4) fail all in-flight commands
  5) reconnect

we should aim for:

  1) block new queuecommand instances
  2) fail all in-flight commands
  3) disconnect and reconnect

to avoid the need to keep track of pending queuecommand instances,
and instead re-use the existing infrastructure to fail all in-flight
commands, which we have the infrastructure for, and which we need
to do anyway.

Hello Christoph,

Your proposal absolutely makes sense to me but unfortunately I do not 
have the time available now to implement it. Would it be acceptable if I 
rework scsi_wait_for_queuecommand() such that per-CPU counters are 
introduced in blk-mq instead of one counter per hctx ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html