RE: [PATCH v1 0/2] scsi: ufs: Fix broken hba->outstanding_tasks

Stanley Chu <stanley.chu@xxxxxxxxxxxx> · Wed, 24 Jul 2019 13:08:24 +0800

Hi Avri,

On Mon, 2019-07-22 at 06:10 +0000, Avri Altman wrote:
> > 
> > >
> > > Hi,
> > >
> > > >
> > > > Currently bits in hba->outstanding_tasks are cleared only after their
> > > > corresponding task management commands are successfully done by
> > > > __ufshcd_issue_tm_cmd().
> > > >
> > > > If timeout happens in a task management command, its corresponding
> > > > bit in hba->outstanding_tasks will not be cleared until next task
> > > > management command with the same tag used successfully finishes.‧
> > > ufshcd_clear_tm_cmd is also called as part of ufshcd_err_handler.
> > > Does this change something in your assumptions?
> > And BTW there is a specific __clear_bit in __ufshcd_issue_tm_cmd() in case
> > of a TO.
> 
> Gave it another look - 
> If indeed this bit isn't cleared as part of the error flow that the timeout triggers,
> I think you should relate to ufshcd_clear_tm_cmd specifically in your commit log - 
> Because this is the obvious place where the bit cleanup should take place.
> 
> Also the fix should be much more intuitive IMO - 
> Today we do __clear_bit() on success, ufshcd_clear_tm_cmd() on error,
> And also ufshcd_put_tm_slot() either way?
> 
> Maybe you can choose a single place to clear it, without any additional code?

ufshcd_clear_tm_cmd() is similar to ufshcd_clear_cmd() which tries to
write timed-out bit in "clear register". They do not clean bits in both
outstanding masks.

Another reason to not to do outstanding bits cleanup in
ufshcd_clear_tm_cmd() is: if host is unable to clear TM command by
setting "clear register", the TM command may be still "outstanding" in
the device. In this case, it may be better to do cleanup after reset is
done. Cleanup includes bits in hba->outstanding_tasks and
hba->tm_slots_in_use which is possibly cleaned too early by
ufshcd_put_tm_slot() if command is timed-out now.

Referring to error handling flow of hba->outstanding_reqs, all timed-out
bits will be cleaned by ufshcd_reset_and_restore() =>
ufshcd_transfer_req_compl() after reset is done. Similar handling for
hba->outstanding_tasks could be applied, i.e., do cleanup by
ufshcd_reset_and_restore() => ufshcd_tmc_handler().

The next thing is what you suggested: How to make the fix more
intuitive. Patchset v2 will be uploaded for review: It tries to
"re-factor" cleanup jobs first, and then add fixed flow to make the
whole patch more readable.

One more thing, above description and flow is done through UFSHCD SCSI
error handling routines registered with SCSI Midlayer. For TM command
timeout happening in bsg path without error handling triggered by SCSI
layer, we may need to consider to handle those tasks in future patches.

> 
> Thanks,
> Avri

Thanks,
Stanley

> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-mediatek