Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock

Adrian Hunter <adrian.hunter@xxxxxxxxx> · Sat, 11 Sep 2021 19:47:36 +0300

On 8/09/21 1:36 am, Bart Van Assche wrote:
> On 9/7/21 9:56 AM, Bart Van Assche wrote:
>> On 9/7/21 8:43 AM, Adrian Hunter wrote:
>>> No.  Requests cannot make progress when ufshcd_state is
>>> UFSHCD_STATE_EH_SCHEDULED_FATAL, and only the error handler can change that,
>>> so if the error handler is waiting to enter the queue and blk_mq_freeze_queue()
>>> is waiting for outstanding requests, they will deadlock.
>>
>> How about adding the above text as a comment above ufshcd_clear_ua_wluns() such
>> that this information becomes available to those who have not followed this
>> conversation?
> 
> After having given patch 1/3 some further thought: an unfortunate
> effect of this patch is that unit attention clearing is skipped for
> the states UFSHCD_STATE_EH_SCHEDULED_FATAL and UFSHCD_STATE_RESET.

Only if the error handler is racing with blk_mq_freeze_queue(), but it
is not ideal.

> How about replacing patch 1/3 with the untested patch below since that
> patch does not have the disadvantage of sometimes skipping clearing UA?

I presume you mean without reverting "scsi: ufs: Synchronize SCSI
and UFS error handling" but in that case the deadlock happens because:

error handler is waiting on blk_queue_enter()
blk_queue_enter() is waiting on blk_mq_freeze_queue()
blk_mq_freeze_queue() is waiting on outstanding requests
outstanding requests are blocked by the SCSI error handler shost_state == SHOST_RECOVERY set by scsi_schedule_eh()

> 
> Thanks,
> 
> Bart.
> 
> [PATCH] scsi: ufs: Fix a recently introduced deadlock
> 
> Completing pending commands with DID_IMM_RETRY triggers the following
> code paths:
> 
>   scsi_complete()
>   -> scsi_queue_insert()
>     -> __scsi_queue_insert()
>       -> scsi_device_unbusy()
>         -> scsi_dec_host_busy()
>       -> scsi_eh_wakeup()
>       -> blk_mq_requeue_request()
> 
>   scsi_queue_rq()
>   -> scsi_host_queue_ready()
>     -> scsi_host_in_recovery()
> 
> Fixes: a113eaaf8637 ("scsi: ufs: Synchronize SCSI and UFS error handling")
> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
> ---
>  drivers/scsi/ufs/ufshcd.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index c2c614da1fb8..9560f34f3d27 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -2707,6 +2707,14 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
>          }
>          fallthrough;
>      case UFSHCD_STATE_RESET:
> +        /*
> +         * The SCSI error handler only starts after all pending commands
> +         * have failed or timed out. Complete commands with
> +         * DID_IMM_RETRY to allow the error handler to start
> +         * if it has been scheduled.
> +         */
> +        set_host_byte(cmd, DID_IMM_RETRY);
> +        cmd->scsi_done(cmd);

Setting non-zero return value, in this case "err = SCSI_MLQUEUE_HOST_BUSY"
will anyway cause scsi_dec_host_busy(), so does this make any difference?

>          err = SCSI_MLQUEUE_HOST_BUSY;
>          goto out;
>      case UFSHCD_STATE_ERROR: