Several SCSI transport and LLD drivers surround code that does not tolerate concurrent calls of .queuecommand() with scsi_target_block() / scsi_target_unblock(). These last two functions use blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request queues to prevent concurrent .queuecommand() calls. However, that is not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). Hence surround the .queuecommand() call from the SCSI error handler with code that avoids that .queuecommand() gets called in the quiesced state. Notes: - Converting the .queuecommand() call in scsi_send_eh_cmnd() into code that calls blk_get_request() + blk_execute_rq() is not an option since scsi_send_eh_cmnd() must be able to make forward progress even if all requests are allocated. - Converting the .queuecommand() call in scsi_send_eh_cmnd() into a blk_execute_rq() or blk_mq_requeue_request() call is not an option either because that would require to change every individual function in the I/O path. Each function in the I/O path would have to be modified such that it handles requests received from the block layer core and request received from the SCSI EH differently. Since struct scsi_cmnd is not initialized by the block layer for filesystem requests, it is not possible to determine in scsi_queue_rq() whether or not a request has been submitted by the SCSI EH without modifying the block layer. Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxx> Cc: Hannes Reinecke <hare@xxxxxxx> Cc: Johannes Thumshirn <jthumshirn@xxxxxxx> --- Changes compared to v1: - As requested by James, removed the wait queue again that was added to the SCSI device structure. drivers/scsi/scsi_error.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 946039117bf4..71d7d2b893ab 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1039,7 +1039,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, struct scsi_device *sdev = scmd->device; struct Scsi_Host *shost = sdev->host; DECLARE_COMPLETION_ONSTACK(done); - unsigned long timeleft = timeout; + unsigned long timeleft = timeout, delay; struct scsi_eh_save ses; const unsigned long stall_for = msecs_to_jiffies(100); int rtn; @@ -1050,7 +1050,22 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, scsi_log_send(scmd); scmd->scsi_done = scsi_eh_done; - rtn = shost->hostt->queuecommand(shost, scmd); + mutex_lock(&sdev->state_mutex); + while (sdev->sdev_state == SDEV_QUIESCE && timeleft > 0) { + mutex_unlock(&sdev->state_mutex); + SCSI_LOG_ERROR_RECOVERY(5, sdev_printk(KERN_DEBUG, sdev, + "%s: state %d <> %d\n", __func__, sdev->sdev_state, + SDEV_QUIESCE)); + delay = min(timeleft, stall_for); + timeleft -= delay; + msleep(jiffies_to_msecs(delay)); + mutex_lock(&sdev->state_mutex); + } + if (sdev->sdev_state != SDEV_QUIESCE) + rtn = shost->hostt->queuecommand(shost, scmd); + else + rtn = SCSI_MLQUEUE_DEVICE_BUSY; + mutex_unlock(&sdev->state_mutex); if (rtn) { if (timeleft > stall_for) { scsi_eh_restore_cmnd(scmd, &ses); -- 2.16.2