On 10/7/24 12:20 AM, Peter Wang (王信友) wrote:
On Thu, 2024-10-03 at 13:02 -0700, Bart Van Assche wrote:
External email : Please do not click links or open attachments until
you have verified the sender or the content.
On 10/2/24 5:42 AM, Peter Wang (王信友) wrote:
This patch merely aligns with the approach of SDB mode
and does not involve the flow of scsi_done. Besides,
I don't see any issue with concurrency between
ufshcd_abort_one() calling ufshcd_try_to_abort_task()
and scsi_done(). Can you point out the specific flow where
the problem occurs? If there is one, shouldn't SDB mode
have the same issue?
Hi Peter,
Correct, my comment applies to both legacy mode and MCQ mode. From
the
section in the UFS standard about ABORT TASK: "A response of FUNCTION
COMPLETE shall indicate that the command was aborted or was not in
the
task set." In other words, if a command completes just before
ufshcd_try_to_abort_task() calls ufshcd_issue_tm_cmd(), then
ufshcd_try_to_abort_task() will call ufshcd_clear_cmd() for a command
that has already completed. In legacy mode, this call will succeed.
Hi Bart,
Yes, the legacy SDB mode is protected by the outstanding_lock.
Hence, both ufshcd_compl_one_cqe() and ufshcd_abort_all() will call
ufshcd_release(hba). This will cause hba->clk_gating.active_reqs to
be
decremented twice instead of only once. Do you agree that this can
happen and also that it should be prevented that this happens?
Thanks,
Bart.
Sorry, I still don't understand why both ufshcd_compl_one_cqe()
and ufshcd_abort_all() will call ufshcd_release(hba)?
Because I have already removed the ufshcd_release_scsi_cmd from
ufshcd_abort_one, so the command won't be released immediately
after ufshcd_try_to_abort_task succeeds. Instead, it will wait
until the CQ Entry comes in before releasing. And since it is
protected by the cq_lock, it should only release once, right?
Hi Peter,
I think what you wrote applies to MCQ mode only. In my previous email
I clearly referred to "legacy mode" (SDB mode). Summarizing my previous
email, I think that in legacy mode it is possible that ufshcd_release()
is called twice while it only should be called once. Here are the
possible solutions I see:
* Add a function to the SCSI core for setting SCMD_STATE_COMPLETE. This
may be controversial since no other SCSI LLD needs this functionality.
* Changing the error handling approach in the UFS driver to the same
approach other SCSI LLDs use: instead of using queue_work() to
activate the error handler, call scsi_schedule_eh(). This will cause
the error handler to be activated later, namely after all pending
commands have timed out instead of aborting any pending commands
first.
* Add a variant of scsi_schedule_eh() to the SCSI core that accelerates
error handling by calling scsi_timeout() on all pending commands.
Thanks,
Bart.