On 8/09/21 1:36 am, Bart Van Assche wrote: > On 9/7/21 9:56 AM, Bart Van Assche wrote: >> On 9/7/21 8:43 AM, Adrian Hunter wrote: >>> No. Requests cannot make progress when ufshcd_state is >>> UFSHCD_STATE_EH_SCHEDULED_FATAL, and only the error handler can change that, >>> so if the error handler is waiting to enter the queue and blk_mq_freeze_queue() >>> is waiting for outstanding requests, they will deadlock. >> >> How about adding the above text as a comment above ufshcd_clear_ua_wluns() such >> that this information becomes available to those who have not followed this >> conversation? > > After having given patch 1/3 some further thought: an unfortunate > effect of this patch is that unit attention clearing is skipped for > the states UFSHCD_STATE_EH_SCHEDULED_FATAL and UFSHCD_STATE_RESET. Only if the error handler is racing with blk_mq_freeze_queue(), but it is not ideal. > How about replacing patch 1/3 with the untested patch below since that > patch does not have the disadvantage of sometimes skipping clearing UA? I presume you mean without reverting "scsi: ufs: Synchronize SCSI and UFS error handling" but in that case the deadlock happens because: error handler is waiting on blk_queue_enter() blk_queue_enter() is waiting on blk_mq_freeze_queue() blk_mq_freeze_queue() is waiting on outstanding requests outstanding requests are blocked by the SCSI error handler shost_state == SHOST_RECOVERY set by scsi_schedule_eh() > > Thanks, > > Bart. > > [PATCH] scsi: ufs: Fix a recently introduced deadlock > > Completing pending commands with DID_IMM_RETRY triggers the following > code paths: > > scsi_complete() > -> scsi_queue_insert() > -> __scsi_queue_insert() > -> scsi_device_unbusy() > -> scsi_dec_host_busy() > -> scsi_eh_wakeup() > -> blk_mq_requeue_request() > > scsi_queue_rq() > -> scsi_host_queue_ready() > -> scsi_host_in_recovery() > > Fixes: a113eaaf8637 ("scsi: ufs: Synchronize SCSI and UFS error handling") > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > --- > drivers/scsi/ufs/ufshcd.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index c2c614da1fb8..9560f34f3d27 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -2707,6 +2707,14 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) > } > fallthrough; > case UFSHCD_STATE_RESET: > + /* > + * The SCSI error handler only starts after all pending commands > + * have failed or timed out. Complete commands with > + * DID_IMM_RETRY to allow the error handler to start > + * if it has been scheduled. > + */ > + set_host_byte(cmd, DID_IMM_RETRY); > + cmd->scsi_done(cmd); Setting non-zero return value, in this case "err = SCSI_MLQUEUE_HOST_BUSY" will anyway cause scsi_dec_host_busy(), so does this make any difference? > err = SCSI_MLQUEUE_HOST_BUSY; > goto out; > case UFSHCD_STATE_ERROR: