On 2020-05-28 02:47, Can Guo wrote: > Hi Bart, > > On 2019-12-25 06:02, Bart Van Assche wrote: >> The UFS SCSI timeout handler was needed to compensate that >> ufshcd_queuecommand() could return SCSI_MLQUEUE_HOST_BUSY for a long >> time. Commit a276c19e3e98 ("scsi: ufs: Avoid busy-waiting by eliminating >> tag conflicts") fixed this so the timeout handler is no longer necessary. >> >> See also commit f550c65b543b ("scsi: ufs: implement scsi host timeout >> handler"). >> > > Sorry for bugging you on this old change. I am afraid we may need to add > this timeout handler back. Because there is till chances that a request > gets stuck somewhere in ufshcd_queuecommand() path before > ufshcd_send_command() gets called. e.g. > > ufshcd_queuecommand() > ->ufshcd_map_sg() > -->scsi_dma_map() > --->dma_map_sg() > ---->dev->ops->map_sg() > > map_sg() ops may get stuck. map_sg() method can vary on different platforms > based on actual IOMMU engines. We cannot gaurantee map_sg() ops must return > immediately as we don't know what is actually inside map_sg() ops. > > And if it gets stuck there for a long time till the request times out, > without > the UFS timeout handler, scsi layer will try to abort this request from UFS > driver by calling ufshcd_abort() eventually. ufshcd_abort() will think this > request has been completed due to its tag is not in hba->outstanding_reqs > or UFS host's door bell reg. However, actually, this request is still in > ufshcd_queuecommand() path. I don't need to continue on the subsequent > impact > to UFS driver if ufshcd_abort() happens in this case. This is a corner > case, > but it is still possible (I did see map_sg() ops hangs on real devices). > > Having the UFS timeout handler back will prevent this situation as UFS > timeout > handler checks if the tag is in hba->outstanding_reqs (for our case, it > is not > in there), if no, it returns BLK_EH_RESET_TIMER so that scsi/block layer > will > keep waiting. > > What do you think? Please let me know your ideas on this, thanks! Hi Can, I see the following issues with the above proposal: - Although I haven't been able to find explicit documentation of this, I think that dma_map_sg() must not sleep. If it would sleep that would break most block and SCSI drivers because many of these drivers call dma_map_sg() from their .queue_rq() or .queuecommand() implementation and if BLK_MQ_F_BLOCKING has not been set these functions must not sleep. - A timeout handler must not be invoked while .queuecommand() is still in progress. The SCSI core calls blk_mq_start_request() before it calls ufshcd_queuecommand(). The blk_mq_start_request() activates the block layer timeout mechanism. ufshcd_queuecommand() must have finished before the block layer timeout handler is activated. Please fix the root cause, namely the map_sg implementation that may get stuck. Thanks, Bart.