On 6/6/19 10:40 PM, Bart Van Assche wrote: > On 6/5/19 10:50 PM, Hannes Reinecke wrote: >> On 6/5/19 10:14 PM, Bart Van Assche wrote: >>> Several SCSI transport and LLD drivers surround code that does not >>> tolerate concurrent calls of .queuecommand() with scsi_target_block() / >>> scsi_target_unblock(). These last two functions use >>> blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request >>> queues to prevent concurrent .queuecommand() calls. However, that is >>> not sufficient to prevent .queuecommand() calls from >>> scsi_send_eh_cmnd(). >>> Hence surround the .queuecommand() call from the SCSI error handler with >>> code that avoids that .queuecommand() gets called in the quiesced state. >>> >>> Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into >>> code that calls blk_get_request() + blk_execute_rq() is not an option >>> since scsi_send_eh_cmnd() must be able to make forward progress even >>> if all requests have been allocated. >>> >> Hmm. Have you actually observed this? >> Typically, scsi_target_block()/scsi_target_unblock() is called prior to >> invoking EH, to allow the system to settle and to guarantee that it's >> fully quiesced. Only then EH is started. >> Consequently, scsi_target_block()/scsi_target_unblock() really shouldn't >> be called during EH; we're essentially single-threaded at this point, so >> nothing else will be submitting command. >> Can you explain why you need this? > > Hi Hannes, > > As one can see in the commit message of patch 3/3, I have observed a > .queuecommand() call by the SCSI EH causing a crash. > > The SCSI EH and blocking of SCSI devices have different triggers: > - As one can see in scsi_times_out(), if a SCSI command times out and an > abort has already been scheduled for that command then that command is > handed over to the SCSI error handler. After all commands that are in > progress have failed the error handler thread is woken up. > - The iSCSI and SRP transport drivers call scsi_target_block() if a > transport layer error has been observed. This can happen from another > thread than the SCSI error handler thread and these functions can be > called either before or after the SCSI error handler thread has been > woken up. > But then I'd rather not quiesce the queue in these circumstances, like in this (untested) patch: diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 34eaef631064..e0bdde025d1a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2641,8 +2641,17 @@ static int scsi_internal_device_block(struct scsi_device *sdev) mutex_lock(&sdev->state_mutex); err = scsi_internal_device_block_nowait(sdev); - if (err == 0) - blk_mq_quiesce_queue(q); + if (err == 0) { + unsigned long flags; + + spin_lock_irqsave(sdev->host->host_lock, flags); + if (sdev->host->shost_state != SHOST_RECOVERY && + sdev->host->shost_state != SHOST_CANCEL_RECOVERY) { + spin_unlock_irqrestore(sdev->host->host_lock, flags); + blk_mq_quiesce_queue(q); + } else + spin_unlock_irqrestore(sdev->host->host_lock, flags); + } mutex_unlock(&sdev->state_mutex); return err; Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)