Re: [PATCH V2 1/3] scsi: ufs: Fix error handler clear ua deadlock

Adrian Hunter <adrian.hunter@xxxxxxxxx> · Tue, 7 Sep 2021 14:06:09 +0300

On 7/09/21 3:37 am, Bart Van Assche wrote:
> On 9/5/21 02:51, Adrian Hunter wrote:
>> On 3/09/21 11:29 pm, Bart Van Assche wrote:
>>> On 9/3/21 2:56 AM, Adrian Hunter wrote:
>>>> There is no guarantee to be able to enter the queue if requests
>>>> are blocked. That is because freezing the queue will block entry
>>>> to the queue, but freezing also waits for outstanding requests
>>>> which can make no progress while the queue is blocked.
>>>>
>>>> That situation can happen when the error handler issues requests
>>>> to clear unit attention condition. The deadlock is very unlikely,
>>>> so the error handler can be expected to clear ua at some point
>>>> anyway, so the simple solution is not to wait to enter the
>>>> queue.
>>>>
>>>> Additionally, note that the RPMB queue might be not be entered
>>>> because it is runtime suspended, but in that case ua will be
>>>> cleared at RPMB runtime resume.
>>>
>>> The only ufshcd_clear_ua_wluns() call that I am aware of and that
>>> is related to error handling is the call in
>>> ufshcd_err_handling_unprepare(). That call happens after
>>> ufshcd_scsi_unblock_requests() has been called so how can it be
>>> involved in a deadlock?
>>
>> That is a very good question.  I went back to reproduce the deadlock
>> again, and it is because, in addition, ufshcd_state is
>> UFSHCD_STATE_EH_SCHEDULED_FATAL.  So I have updated the commit
>> message accordingly in V3.
>>
>>> Additionally, the ufshcd_scsi_block_requests() and
>>> ufshcd_scsi_unblock_requests() calls can be removed from
>>> ufshcd_err_handling_prepare() and ufshcd_err_handling_unprepare().
>>> These calls are no longer necessary since patch "scsi: ufs:
>>> Synchronize SCSI and UFS error handling".
>>
>> As has been noted, that commit introduces several new deadlocks - and
>> will presumably cause the deadlock this patches addresses, even if
>> ufshcd_state is not UFSHCD_STATE_EH_SCHEDULED_FATAL.
>>
>> It is perhaps more appropriate to revert "scsi: ufs: Synchronize SCSI
>> and UFS error handling" for v5.15 and try to get things sorted out
>> for v5.16.  What do you think?
> 
> Reverting that patch would be a step backwards because it would make it again possible that the SCSI EH and UFS EH run concurrently and obstruct each other.

I wouldn't say it is a step backwards, just a step forwards the driver is not ready for.

For me, the change causes deadlocks so it is a regression.

I have never seen SCSI EH cause a problem, but AFAICT it is not needed because the UFS driver's error handler is always scheduled when needed.

As a temporary workaround until the driver is ready for SCSI EH, interference between SCSI EH and UFS EH could presumably be avoided by setting eh_strategy_handler to an empty function.

> 
> Does the above mean that "if (hba->pm_op_in_progress)" should be removed from the following code in ufshcd_queuecommand()?
> 
>     case UFSHCD_STATE_EH_SCHEDULED_FATAL:
>         if (hba->pm_op_in_progress) {
>             hba->force_reset = true;
>             set_host_byte(cmd, DID_BAD_TARGET);
>             cmd->scsi_done(cmd);
>             goto out;
>         }

It seems to me that removing "if (hba->pm_op_in_progress)" would cause errors for requests that had not in fact even been issued.