Hi Bart, Avri, On Tue, 2020-07-14 at 21:00 -0700, Bart Van Assche wrote: > On 2020-07-13 01:10, Avri Altman wrote: > > Artificially injecting errors is a very common validation mechanism, > > Provided that you are not breaking anything of the upper-layers, > > Which I don't think you are doing. > As the concerns of below questions, "scsi timeout is 30sec - do you expect an interrupt to arrive after that?" Actually in my test scenario, the flow works well without re-checking "outstanding_reqs" in "cleanup" section in ufshcd_abort(), so I would remove this checking first and resend this fix (with refined commit message according to blk-mq, not legacy blk). Please let me know if you have any suggestions. > Hi Avri, > > My concern is that the code that is being added in the abort handler > sooner or later will evolve into a duplicate of the regular completion > path. Wouldn't it be better to poll for completions from the timeout > handler by calling ufshcd_transfer_req_compl() instead of duplicating > that function? > The duplicated calls of cleanup job would be as below, scsi_dma_unmap(cmd); hba->lrb[tag].cmd = NULL; ufshcd_outstanding_req_clear(hba, tag); As your suggestions, above calls could be re-factored but the third call in __ufshcd_transfer_req_compl() would be more efficient by hba->outstanding_reqs ^= completed_reqs; for all handled requests in interrupt handler. Here we could not directly use "ufshcd_transfer_req_compl()" or its inner function "__ufshcd_transfer_req_compl()" since at least scsi_done() is not required in ufshcd_abort() because the completion flow will be handled by SCSI error handler, not ufshcd_abort() itself. > >>> In section 7.2.3 of the UFS specification I found the following about how > >>> to process request completions: "Software determines if new TRs have > >>> completed since step #2, by repeating one of the two methods described in > >>> step #2. If new TRs have completed, software repeats the sequence from > >>> step #3." Is such a loop perhaps missing from the Linux UFS driver? > > > > Could not find that citation. > > What version of the spec are you using? > > That quote comes from the following document: "Universal Flash Storage > Host Controller Interface (UFSHCI); Version 2.1; JESD223C; (Revision of > JESD223B, September 2013); MARCH 2016". Above description has already be implemented in ufshcd_intr() and ufshcd_transfer_req_compl(). But this loop cannot save "missing interrupt" just like this injected error case. Thanks, Stanley Chu