On Mon, 2024-09-23 at 11:15 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/23/24 1:03 AM, peter.wang@xxxxxxxxxxxx wrote: > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > > index b5c7bc50a27e..b42079c3d634 100644 > > --- a/drivers/ufs/core/ufshcd.c > > +++ b/drivers/ufs/core/ufshcd.c > > @@ -5404,7 +5404,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba > *hba, struct ufshcd_lrb *lrbp, > > } > > break; > > case OCS_ABORTED: > > -result |= DID_ABORT << 16; > > +if (hba->quirks & UFSHCD_QUIRK_OCS_ABORTED) > > +result |= DID_REQUEUE << 16; > > +else > > +result |= DID_ABORT << 16; > > dev_warn(hba->dev, > > "OCS aborted from controller = %x for tag %d\n", > > ocs, lrbp->task_tag); > > I think the approach of this patch is racy: the cmd->result > assignment > by ufshcd_transfer_rsp_status() races with the cmd->result assignment > by > ufshcd_abort_one(). How about addressing this race as follows? > * In ufshcd_compl_one_cqe(), if the OCS_ABORTED status is > encountered, > set a completion. > * In the code that aborts SCSI commands, for MediaTek controllers > only, > wait for that completion (based on a quirk). > * Instead of introducing an if-statement in > ufshcd_transfer_rsp_status(), rely on the cmd->result value > assigned > by ufshcd_abort_one(). > > Thanks, > > Bart. Hi Bart, Sorry, I might not have understood the potential racing issue here. The SCSI abort command doesn't need to wait for completion because SCSI doesn't care about cmd->result, right? The error handler abort also doesn't need to wait for completion, because it should have a guaranteed order? Firstly, ufshcd_abort_one is only likely to be filled back with OCS: ABORTED by MediaTek UFS controller after ufshcd_try_to_abort_task, and the sequence is as follows. ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task() // trigger mediatek controller fill OCS: ABORTED ufshcd_complete_requests() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE In addition, the ISR will use the outstanding_lock with ufshcd_err_handler to provide protection, so there won't be any racing that causes the command to be released repeatedly. The only possible issue might be that after ufshcd_abort_one, the MediaTek UFS controller has not yet filled in OCS: ABORTED and has entered ufshcd_transfer_rsp_status for checking. But this doesn't matter, because it will just be treated as OCS_INVALID_COMMAND_STATUS. Is there any corner case that I might have overlooked? Thanks. Peter >