Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()

Bart Van Assche <bart.vanassche@xxxxxxxxxxx> · Tue, 12 May 2015 10:49:47 +0200

On 05/11/15 13:50, Christoph Hellwig wrote:
On Mon, May 11, 2015 at 12:58:03PM +0200, Bart Van Assche wrote:
What I'm wondering about is whether it will be possible with the above
approach to trigger path failover before (2 * SCSI timeout) has expired ?
Starting SCSI error handling immediately after the block layer has reported
the first SCSI timeout is only safe if all ongoing SCSI commands are
canceled in some way. Is this what the function blk_abort_request() is
intended for ? As far as I can see invoking that function or any function
with a similar purpose is only safe after the queuecommand() callback
function has finished. However, blk_mq_run_hw_queue() invokes
mq_ops->queue_rq() without holding any lock. So it's not clear to me how to
safely cancel ongoing blk-mq requests without waiting until these have timed
out. I hope that this means that overlooked something ?

For the blk-mq case invoking it earlier should be fine - the
REQ_ATOM_STARTED and REQ_ATOM_COMPLETE bit ops are specifily designed
so that calling the timeout handler on any request is fine.  I'm not
sure about the !blk-mq case, though.

Hello Christoph,

Thanks for the feedback. However, I'm still wondering what will happen 
if blk_abort_request() causes e.g. blk_rq_unmap_user() or 
blk_update_request() to be called while mq_ops->queue_rq() or 
q->request_fn() is still in progress ? More in general, I'm not sure it 
is possible to avoid that blk_abort_request() races with a request 
queuing function by only letting the block layer set an additional 
request flag. Setting an additional flag just before queue_rq() or 
request_fn() is called would not allow to detect when these callback 
functions have finished. Setting a flag just after queue_rq() or 
request_fn() have returned without introducing additional locking or 
atomic operations would make it possible that blk_mark_rq_complete() is 
called from the I/O completion path before that new flag is set. This 
means that with this last approach may make it necessary to increase / 
decrease a request reference count around the queue_rq() or request_fn() 
call + flag set operation. Another possible approach would be to replace 
the REQ_ATOM_STARTED and REQ_ATOM_COMPLETE flags with a request state 
variable that is modified atomically. Further feedback is welcome.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html