On 6/7/19 5:49 AM, Hannes Reinecke wrote:
On 6/6/19 10:40 PM, Bart Van Assche wrote:
As one can see in the commit message of patch 3/3, I have observed a
.queuecommand() call by the SCSI EH causing a crash.
The SCSI EH and blocking of SCSI devices have different triggers:
- As one can see in scsi_times_out(), if a SCSI command times out and an
abort has already been scheduled for that command then that command is
handed over to the SCSI error handler. After all commands that are in
progress have failed the error handler thread is woken up.
- The iSCSI and SRP transport drivers call scsi_target_block() if a
transport layer error has been observed. This can happen from another
thread than the SCSI error handler thread and these functions can be
called either before or after the SCSI error handler thread has been
woken up.
But then I'd rather not quiesce the queue in these circumstances, like
in this (untested) patch:
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 34eaef631064..e0bdde025d1a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2641,8 +2641,17 @@ static int scsi_internal_device_block(struct
scsi_device *sdev)
mutex_lock(&sdev->state_mutex);
err = scsi_internal_device_block_nowait(sdev);
- if (err == 0)
- blk_mq_quiesce_queue(q);
+ if (err == 0) {
+ unsigned long flags;
+
+ spin_lock_irqsave(sdev->host->host_lock, flags);
+ if (sdev->host->shost_state != SHOST_RECOVERY &&
+ sdev->host->shost_state != SHOST_CANCEL_RECOVERY) {
+ spin_unlock_irqrestore(sdev->host->host_lock,
flags);
+ blk_mq_quiesce_queue(q);
+ } else
+ spin_unlock_irqrestore(sdev->host->host_lock,
flags);
+ }
mutex_unlock(&sdev->state_mutex);
return err;
A significant disadvantage of the above patch is that it makes it less
likely that error handling will succeed. If the error handler is
activated before the transport recovery code is activated, I think that
transport recovery should really happen. The above patch makes it less
likely that the SCSI error handler will succeed.
Additionally, scsi_target_block() ignores the value returned by
scsi_internal_device_block() and it is nontrivial to make some of the
scsi_target_block() callers handle scsi_target_block() failures.
Bart.