On 16/11/2021 01:09, Bart Van Assche wrote: > On 11/12/21 2:56 AM, Adrian Hunter wrote: >> On 10/11/2021 20:56, Bart Van Assche wrote: >>> On 11/10/21 12:57 AM, Adrian Hunter wrote: >>>> Seems like something ufshcd_clear_cmd() should be doing instead? >>> >>> I'm concerned that would break ufshcd_eh_device_reset_handler() >>> since that reset handler retries SCSI commands by calling >>> __ufshcd_transfer_req_compl() after having called ufshcd_clear_cmd(). >> Whenever an outstanding_reqs bit transitions 1 -> 0, then >> __ufshcd_transfer_req_compl() must be called. > > I will look further into this. > >> As a separate issue, in ufshcd_abort() there is: >> >> /* If command is already aborted/completed, return FAILED. */ >> if (!(test_bit(tag, &hba->outstanding_reqs))) { >> dev_err(hba->dev, >> "%s: cmd at tag %d already completed, outstanding=0x%lx, doorbell=0x%x\n", >> __func__, tag, hba->outstanding_reqs, reg); >> goto release; >> } >> >> which seems wrong. FAILED should only be returned to escalate the >> error handling, so if the slot has already successfully been >> cleared, that is SUCCESS. scsi_times_out() has already blocked >> the scsi_done() path (by setting SCMD_STATE_COMPLETE), so any use >> after free must be being caused by SCSI not the ufs driver. > > scmd_eh_abort_handler() would trigger a use-after-free if ufshcd_abort() would return SUCCESS for completed commands. Hence the choice for the return value FAILED for completed commands. > > BTW, can this code path ever be reached since scsi_done() sets the SCMD_STATE_COMPLETE bit before it calls blk_mq_complete_request() and since scsi_times_out() tests that bit before it schedules a call of ufshcd_abort()? Racing with timing out, the UFS interrupt handler can clear hba->outstanding_reqs bit and call __ufshcd_transfer_req_compl(), but the request will not be freed if scsi_times_out() wins the race. So there should be no use-after-free unless SCSI error handling itself has a bug. One perhaps unrelated issue in scsi_times_out(): if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state)) return BLK_EH_RESET_TIMER; Shouldn't that return BLK_EH_DONE not BLK_EH_RESET_TIMER, since the request has been through blk_mq_complete_request() ?