On 7/09/21 3:37 am, Bart Van Assche wrote: > On 9/5/21 02:51, Adrian Hunter wrote: >> On 3/09/21 11:29 pm, Bart Van Assche wrote: >>> On 9/3/21 2:56 AM, Adrian Hunter wrote: >>>> There is no guarantee to be able to enter the queue if requests >>>> are blocked. That is because freezing the queue will block entry >>>> to the queue, but freezing also waits for outstanding requests >>>> which can make no progress while the queue is blocked. >>>> >>>> That situation can happen when the error handler issues requests >>>> to clear unit attention condition. The deadlock is very unlikely, >>>> so the error handler can be expected to clear ua at some point >>>> anyway, so the simple solution is not to wait to enter the >>>> queue. >>>> >>>> Additionally, note that the RPMB queue might be not be entered >>>> because it is runtime suspended, but in that case ua will be >>>> cleared at RPMB runtime resume. >>> >>> The only ufshcd_clear_ua_wluns() call that I am aware of and that >>> is related to error handling is the call in >>> ufshcd_err_handling_unprepare(). That call happens after >>> ufshcd_scsi_unblock_requests() has been called so how can it be >>> involved in a deadlock? >> >> That is a very good question. I went back to reproduce the deadlock >> again, and it is because, in addition, ufshcd_state is >> UFSHCD_STATE_EH_SCHEDULED_FATAL. So I have updated the commit >> message accordingly in V3. >> >>> Additionally, the ufshcd_scsi_block_requests() and >>> ufshcd_scsi_unblock_requests() calls can be removed from >>> ufshcd_err_handling_prepare() and ufshcd_err_handling_unprepare(). >>> These calls are no longer necessary since patch "scsi: ufs: >>> Synchronize SCSI and UFS error handling". >> >> As has been noted, that commit introduces several new deadlocks - and >> will presumably cause the deadlock this patches addresses, even if >> ufshcd_state is not UFSHCD_STATE_EH_SCHEDULED_FATAL. >> >> It is perhaps more appropriate to revert "scsi: ufs: Synchronize SCSI >> and UFS error handling" for v5.15 and try to get things sorted out >> for v5.16. What do you think? > > Reverting that patch would be a step backwards because it would make it again possible that the SCSI EH and UFS EH run concurrently and obstruct each other. I wouldn't say it is a step backwards, just a step forwards the driver is not ready for. For me, the change causes deadlocks so it is a regression. I have never seen SCSI EH cause a problem, but AFAICT it is not needed because the UFS driver's error handler is always scheduled when needed. As a temporary workaround until the driver is ready for SCSI EH, interference between SCSI EH and UFS EH could presumably be avoided by setting eh_strategy_handler to an empty function. > > Does the above mean that "if (hba->pm_op_in_progress)" should be removed from the following code in ufshcd_queuecommand()? > > case UFSHCD_STATE_EH_SCHEDULED_FATAL: > if (hba->pm_op_in_progress) { > hba->force_reset = true; > set_host_byte(cmd, DID_BAD_TARGET); > cmd->scsi_done(cmd); > goto out; > } It seems to me that removing "if (hba->pm_op_in_progress)" would cause errors for requests that had not in fact even been issued.