On Wed, 2024-09-18 at 11:29 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/18/24 6:29 AM, Peter Wang (王信友) wrote: > > Basically, this patch currently only needs to handle requeueing > > for the error handler abort. > > The approach for DBR mode and MCQ mode should be consistent. > > If receive an interrupt response (OCS:ABORTED or > INVALID_OCS_VALUE), > > then set DID_REQUEUE. If there is no interrupt, it will also set > > SCSI DID_REQUEUE in ufshcd_err_handler through > > ufshcd_complete_requests > > with force_compl = true. > > Reporting a completion for commands cleared by writing into the > legacy > UTRLCLR register is not compliant with any version of the UFSHCI > standard. Reporting a completion for commands cleared by writing into > that register is problematic because it causes > ufshcd_release_scsi_cmd() > to be called as follows: > > ufshcd_sl_intr() > ufshcd_transfer_req_compl() > ufshcd_poll() > __ufshcd_transfer_req_compl() > ufshcd_compl_one_cqe() > cmd->result = ... > ufshcd_release_scsi_cmd() > scsi_done() > > Calling ufshcd_release_scsi_cmd() if a command has been cleared is > problematic because the SCSI core does not expect this. If > ufshcd_try_to_abort_task() clears a SCSI command, > ufshcd_release_scsi_cmd() must not be called until the SCSI core > decides to release the command. This is why I wrote in a previous > mail > that I think that a quirk should be introduced to suppress the > completions generated by clearing a SCSI command. > Hi Bart, I'm not sure if I'm misunderstanding your point, but I feel that ufshcd_release_scsi_cmd should always be called. It's scsi_done that shouldn't be called, as it should be left to the SCSI layer to decide how to handle this command. Because ufshcd_release_scsi_cmd is just about releasing resources related to ufshcd_map_sg and the clock at the UFS driver level. scsi_done is what notifies the SCSI layer that the cmd has finished, asking it to look at the result to decide how to proceed. > > The more problematic part is with MCQ mode. To imitate the DBR > > approach, we just need to set DID_REQUEUE upon receiving an > interrupt. > > Everything else remains the same. This would make things simpler. > > > > Moving forward, if we want to simplify things and we have also > > taken stock of the two or three scenarios where OCS: ABORTED > occurs, > > do we even need a flag? Couldn't we just set DID_REQUEUE directly > > for OCS: ABORTED? > > What do you think? > > How about making ufshcd_compl_one_cqe() skip entries with status > OCS_ABORTED? That would make ufshcd_compl_one_cqe() behave as the > SCSI core expects, namely not freeing any command resources if a > SCSI command is aborted successfully. > > This approach may require further changes to ufshcd_abort_all(). > In that function there are separate code paths for legacy and MCQ > mode. This is less than ideal. Would it be possible to combine > these code paths by removing the ufshcd_complete_requests() call > from ufshcd_abort_all() and by handling completions from inside > ufshcd_abort_one()? > > Thanks, > > Bart. The four case flows for abort are as follows: ---------------------------------------------------------------- Case1: DBR ufshcd_abort In this case, you can see that ufshcd_release_scsi_cmd will definitely be called. ufshcd_abort() ufshcd_try_to_abort_task() // It should trigger an interrupt, but the tensor might not get outstanding_lock clear outstanding_reqs tag ufshcd_release_scsi_cmd() release outstanding_lock ufshcd_intr() ufshcd_sl_intr() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done(); In most cases, ufshcd_intr will not reach scsi_done because the outstanding_reqs tag is cleared by the original thread. Therefore, whether there is an interrupt or not doesn't affect the result because the ISR will do nothing in most cases. In a very low chance, the ISR will reach scsi_done and notify SCSI to requeue, and the original thread will not call ufshcd_release_scsi_cmd. MediaTek may need to change DID_ABORT to DID_REQUEUE in this situation, or perhaps not handle this ISR at all. ---------------------------------------------------------------- Case2: MCQ ufshcd_abort In the case of MCQ ufshcd_abort, you can also see that ufshcd_release_scsi_cmd will definitely be called too. However, there seems to be a problem here, as ufshcd_release_scsi_cmd might be called twice. This is because cmd is not null in ufshcd_release_scsi_cmd, which the previous version would set cmd to null. Skipping OCS: ABORTED in ufshcd_compl_one_cqe indeed can avoid this problem. This part needs further consideration on how to handle it. ufshcd_abort() ufshcd_mcq_abort() ufshcd_try_to_abort_task() // will trigger ISR ufshcd_release_scsi_cmd() ufs_mtk_mcq_intr() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT ufshcd_release_scsi_cmd() // will release twice scsi_done() ---------------------------------------------------------------- Case3: DBR ufshcd_err_handler In the case of the DBR mode error handler, it's the same; ufshcd_release_scsi_cmd will also be executed, and scsi_done will definitely be used to notify SCSI to requeue. ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task() // It should trigger an interrupt, but the tensor might not ufshcd_complete_requests() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done() ufshcd_intr() ufshcd_sl_intr() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done(); At this time, the same actions are taken regardless of whether there is an ISR, and with the protection of outstanding_lock, only one thread will execute ufshcd_release_scsi_cmd and scsi_done. ---------------------------------------------------------------- Case4: MCQ ufshcd_err_handler It's the same with MCQ mode; there is protection from the cqe lock, so only one thread will execute. What my patch 2 aims to do is to change DID_ABORT to DID_REQUEUE in this situation. ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task() // will trigger irq thread ufshcd_complete_requests() ufshcd_mcq_compl_pending_transfer() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT // should change to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done() ufs_mtk_mcq_intr() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT // should change to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done() Thanks Peter