Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/11/15 11:31, Christoph Hellwig wrote:
On Mon, May 11, 2015 at 10:54:30AM +0200, Bart Van Assche wrote:
Hello Christoph,

There are multiple events that can cause the SRP initiator driver to
initiate a reconnect:
1. The SCSI core invoking eh_host_reset_handler().
2. An error reported by the IB HCA or by the IB core, e.g. an RDMA
    transmit timeout or a transport layer disconnect reported by the
    IB/CM.

Right, I missed the srp_reconnect_work case.  But even with that I
think what I wrote above still stands.  srp_reconnect_work in that
case would just directly trigger the abort all commands and
reconnect operation.

The main point I was trying to make is that instead of having a sequence
of:

  1) block new queuecommand instances
  2) flush out pending queuecommand instances
  3) do part of the disconnect
  4) fail all in-flight commands
  5) reconnect

we should aim for:

  1) block new queuecommand instances
  2) fail all in-flight commands
  3) disconnect and reconnect

to avoid the need to keep track of pending queuecommand instances,
and instead re-use the existing infrastructure to fail all in-flight
commands, which we have the infrastructure for, and which we need
to do anyway.

Hello Christoph,

What I'm wondering about is whether it will be possible with the above approach to trigger path failover before (2 * SCSI timeout) has expired ? Starting SCSI error handling immediately after the block layer has reported the first SCSI timeout is only safe if all ongoing SCSI commands are canceled in some way. Is this what the function blk_abort_request() is intended for ? As far as I can see invoking that function or any function with a similar purpose is only safe after the queuecommand() callback function has finished. However, blk_mq_run_hw_queue() invokes mq_ops->queue_rq() without holding any lock. So it's not clear to me how to safely cancel ongoing blk-mq requests without waiting until these have timed out. I hope that this means that overlooked something ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux