Stanley, > > Hi Avri, > > On Mon, 2019-07-22 at 06:10 +0000, Avri Altman wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > > > Currently bits in hba->outstanding_tasks are cleared only after their > > > > > corresponding task management commands are successfully done > by > > > > > __ufshcd_issue_tm_cmd(). > > > > > > > > > > If timeout happens in a task management command, its > corresponding > > > > > bit in hba->outstanding_tasks will not be cleared until next task > > > > > management command with the same tag used successfully > finishes.‧ > > > > ufshcd_clear_tm_cmd is also called as part of ufshcd_err_handler. > > > > Does this change something in your assumptions? > > > And BTW there is a specific __clear_bit in __ufshcd_issue_tm_cmd() in > case > > > of a TO. > > > > Gave it another look - > > If indeed this bit isn't cleared as part of the error flow that the timeout > triggers, > > I think you should relate to ufshcd_clear_tm_cmd specifically in your > commit log - > > Because this is the obvious place where the bit cleanup should take > place. > > > > Also the fix should be much more intuitive IMO - > > Today we do __clear_bit() on success, ufshcd_clear_tm_cmd() on error, > > And also ufshcd_put_tm_slot() either way? > > > > Maybe you can choose a single place to clear it, without any additional > code? > > ufshcd_clear_tm_cmd() is similar to ufshcd_clear_cmd() which tries to > write timed-out bit in "clear register". They do not clean bits in both > outstanding masks. > > Another reason to not to do outstanding bits cleanup in > ufshcd_clear_tm_cmd() is: if host is unable to clear TM command by > setting "clear register", the TM command may be still "outstanding" in > the device. In this case, it may be better to do cleanup after reset is > done. Cleanup includes bits in hba->outstanding_tasks and > hba->tm_slots_in_use which is possibly cleaned too early by > ufshcd_put_tm_slot() if command is timed-out now. > > Referring to error handling flow of hba->outstanding_reqs, all timed-out > bits will be cleaned by ufshcd_reset_and_restore() => > ufshcd_transfer_req_compl() after reset is done. Similar handling for > hba->outstanding_tasks could be applied, i.e., do cleanup by > ufshcd_reset_and_restore() => ufshcd_tmc_handler(). Fair enough. Thank you for the detailed explanation. > > The next thing is what you suggested: How to make the fix more > intuitive. Patchset v2 will be uploaded for review: It tries to > "re-factor" cleanup jobs first, and then add fixed flow to make the > whole patch more readable. > > One more thing, above description and flow is done through UFSHCD SCSI > error handling routines registered with SCSI Midlayer. For TM command > timeout happening in bsg path without error handling triggered by SCSI > layer, we may need to consider to handle those tasks in future patches. Please do. Thanks, Avri