Re: [PATCH V2 1/3] scsi: ufs: Fix error handler clear ua deadlock

Bart Van Assche <bvanassche@xxxxxxx> · Mon, 6 Sep 2021 17:37:58 -0700

On 9/5/21 02:51, Adrian Hunter wrote:
On 3/09/21 11:29 pm, Bart Van Assche wrote:
On 9/3/21 2:56 AM, Adrian Hunter wrote:
There is no guarantee to be able to enter the queue if requests
are blocked. That is because freezing the queue will block entry
to the queue, but freezing also waits for outstanding requests
which can make no progress while the queue is blocked.

That situation can happen when the error handler issues requests
to clear unit attention condition. The deadlock is very unlikely,
so the error handler can be expected to clear ua at some point
anyway, so the simple solution is not to wait to enter the
queue.

Additionally, note that the RPMB queue might be not be entered
because it is runtime suspended, but in that case ua will be
cleared at RPMB runtime resume.

The only ufshcd_clear_ua_wluns() call that I am aware of and that
is related to error handling is the call in
ufshcd_err_handling_unprepare(). That call happens after
ufshcd_scsi_unblock_requests() has been called so how can it be
involved in a deadlock?

That is a very good question.  I went back to reproduce the deadlock
again, and it is because, in addition, ufshcd_state is
UFSHCD_STATE_EH_SCHEDULED_FATAL.  So I have updated the commit
message accordingly in V3.
>
Additionally, the ufshcd_scsi_block_requests() and
ufshcd_scsi_unblock_requests() calls can be removed from
ufshcd_err_handling_prepare() and ufshcd_err_handling_unprepare().
These calls are no longer necessary since patch "scsi: ufs:
Synchronize SCSI and UFS error handling".

As has been noted, that commit introduces several new deadlocks - and
will presumably cause the deadlock this patches addresses, even if
ufshcd_state is not UFSHCD_STATE_EH_SCHEDULED_FATAL.

It is perhaps more appropriate to revert "scsi: ufs: Synchronize SCSI
and UFS error handling" for v5.15 and try to get things sorted out
for v5.16.  What do you think?

Reverting that patch would be a step backwards because it would make it 
again possible that the SCSI EH and UFS EH run concurrently and obstruct 
each other.

Does the above mean that "if (hba->pm_op_in_progress)" should be removed 
from the following code in ufshcd_queuecommand()?

	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
		if (hba->pm_op_in_progress) {
			hba->force_reset = true;
			set_host_byte(cmd, DID_BAD_TARGET);
			cmd->scsi_done(cmd);
			goto out;
		}

Thanks,

Bart.