Hi Can, On Sat, 2020-08-01 at 07:17 +0800, Can Guo wrote: > Hi Bart, > > On 2020-08-01 00:51, Bart Van Assche wrote: > > On 2020-07-31 01:00, Can Guo wrote: > >> AFAIK, sychronization of scsi_done is not a problem here, because scsi > >> layer > >> use the atomic state, namely SCMD_STATE_COMPLETE, of a scsi cmd to > >> prevent > >> the concurrency of abort and real completion of it. > >> > >> Check func scsi_times_out(), hope it helps. > >> > >> enum blk_eh_timer_return scsi_times_out(struct request *req) > >> { > >> ... > >> if (rtn == BLK_EH_DONE) { > >> /* > >> * Set the command to complete first in order to > >> prevent > >> a real > >> * completion from releasing the command while error > >> handling > >> * is using it. If the command was already completed, > >> then the > >> * lower level driver beat the timeout handler, and it > >> is safe > >> * to return without escalating error recovery. > >> * > >> * If timeout handling lost the race to a real > >> completion, the > >> * block layer may ignore that due to a fake timeout > >> injection, > >> * so return RESET_TIMER to allow error handling > >> another > >> shot > >> * at this command. > >> */ > >> if (test_and_set_bit(SCMD_STATE_COMPLETE, > >> &scmd->state)) > >> return BLK_EH_RESET_TIMER; > >> if (scsi_abort_command(scmd) != SUCCESS) { > >> set_host_byte(scmd, DID_TIME_OUT); > >> scsi_eh_scmd_add(scmd); > >> } > >> } > >> } > > > > I am familiar with this mechanism. My concern is that both the regular > > completion path and the abort handler must call scsi_dma_unmap() before > > calling cmd->scsi_done(cmd). I don't see how > > test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state) could prevent that > > the regular completion path and the abort handler call scsi_dma_unmap() > > concurrently since both calls happen before the SCMD_STATE_COMPLETE bit > > is set? > > > > Thanks, > > > > Bart. > > For scsi_dma_unmap() part, that is true - we should make it serialized > with > any other completion paths. I've found it during my fault injection > test, so > I've made a patch to fix it, but it only comes in my next error recovery > enhancement patch series. Please check the attachment. > Your patch looks good to me. I have the same idea before but I found that calling scsi_done() (by __ufshcd_transfer_req_compl()) in ufshcd_abort() in old kernel (e.g., 4.14) will cause issues but it has been resolved by introduced SCMD_STATE_COMPLETE flag in newer kernel. So your patch makes sense. Would you mind sending out this draft patch as a formal patch together with my patch to fix issues in ufshcd_abort()? Our patches are aimed to fix cases that host/device reset eventually not being triggered by the result of ufshcd_abort(), for example, command is aborted successfully or command is not pending in device with its doorbell also cleared. Thanks, Stanley Chu > Thanks, > > Can Guo. >