> > Hi Avri, > > On Thu, 2020-07-09 at 08:31 +0000, Avri Altman wrote: > > > > > > If somehow no interrupt notification is raised for a completed request > > > and its doorbell bit is cleared by host, UFS driver needs to cleanup > > > its outstanding bit in ufshcd_abort(). > > Theoretically, this case is already accounted for - > > See line 6407: a proper error is issued and eventually outstanding req is > cleared. > > > > Can you go over the scenario you are attending line by line, > > And explain why ufshcd_abort does not account for it? > > Sure. > > If a request using tag N is completed by UFS device without interrupt > notification till timeout happens, ufshcd_abort() will be invoked. > > Since request completion flow is not executed, current status may be > > - Tag N in hba->outstanding_reqs is set > - Tag N in doorbell register is not set > > In this case, ufshcd_abort() flow would be > > - This log is printed: "ufshcd_abort: cmd was completed, but without a > notifying intr, tag = N" > - This log is printed: "ufshcd_abort: Device abort task at tag N" > - If hba->req_abort_skip is zero, QUERY_TASK command is sent > - Device responds "UPIU_TASK_MANAGEMENT_FUNC_COMPL" > - This log is printed: "ufshcd_abort: cmd at tag N not pending in the > device." > - Doorbell tells that tag N is not set, so the driver goes to label > "out" with this log printed: "ufshcd_abort: cmd at tag %d successfully > cleared from DB." > - In label "out" section, no cleanup will be made, and then ufshcd_abort > exits > - This request will be re-queued to request queue by SCSI timeout > handler > > Now, Inconsistent state shows-up: A request is "re-queued" but its > corresponding resource in UFS layer is not cleared, below flow will > trigger bad things, > > - A new request with tag M is finished > - Interrupt is raised and ufshcd_transfer_req_compl() found both tag N > and M can process the completion flow > - The post-processing flow for tag N will be executed while its request > is still alive > > I am sorry that below messages are only for old kernel in non-blk-mq > case. However above scenario will also trigger bad thing in blk-mq case. Ok. Thanks. > > > > > > > > > Otherwise, system may crash by below abnormal flow: > > > > > > After this request is requeued by SCSI layer with its > > > outstanding bit set, the next completed request will trigger > > > ufshcd_transfer_req_compl() to handle all "completed outstanding > > > bits". In this time, the "abnormal outstanding bit" will be detected > > > and the "requeued request" will be chosen to execute request > > > post-processing flow. This is wrong and blk_finish_request() will > > > BUG_ON because this request is still "alive". > > > > > > It is worth mentioning that before ufshcd_abort() cleans the timed-out > > > request, driver need to check again if this request is really not > > > handled by __ufshcd_transfer_req_compl() yet because it may be > > > possible that the interrupt comes very lately before the cleaning. > > What do you mean? Why checking the outstanding reqs isn't enough? > > > > > > > > Signed-off-by: Stanley Chu <stanley.chu@xxxxxxxxxxxx> > > > --- > > > drivers/scsi/ufs/ufshcd.c | 9 +++++++-- > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > > > index 8603b07045a6..f23fb14df9f6 100644 > > > --- a/drivers/scsi/ufs/ufshcd.c > > > +++ b/drivers/scsi/ufs/ufshcd.c > > > @@ -6462,7 +6462,7 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) > > > /* command completed already */ > > > dev_err(hba->dev, "%s: cmd at tag %d successfully cleared > from > > > DB.\n", > > > __func__, tag); > > > - goto out; > > > + goto cleanup; > > But you've arrived here only if (!(test_bit(tag, &hba->outstanding_reqs))) - > > See line 6400. > > > > > } else { > > > dev_err(hba->dev, > > > "%s: no response from device. tag = %d, err %d\n", > > > @@ -6496,9 +6496,14 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) > > > goto out; > > > } > > > > > > +cleanup: > > > + spin_lock_irqsave(host->host_lock, flags); > > > + if (!test_bit(tag, &hba->outstanding_reqs)) { Is this needed? it was already checked in line 6439. Thanks, Avri > > > + spin_unlock_irqrestore(host->host_lock, flags); > > > + goto out; > > > + } > > > scsi_dma_unmap(cmd); > > > > > > - spin_lock_irqsave(host->host_lock, flags); > > > ufshcd_outstanding_req_clear(hba, tag); > > > hba->lrb[tag].cmd = NULL; > > > spin_unlock_irqrestore(host->host_lock, flags); > > > -- > > > 2.18.0