Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race

Adrian Hunter <adrian.hunter@xxxxxxxxx> · Thu, 14 Oct 2021 09:02:15 +0300

On 14/10/2021 07:14, Bart Van Assche wrote:
> On 10/13/21 08:01, Adrian Hunter wrote:
>> __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out,
>> which races with the completion function ufshcd_tmc_handler() which
>> expects req->end_io_data to have a value.
>>
>> Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
>> synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
>> host_lock spinlock.
>>
>> It is also not necessary (nor typical) to clear req->end_io_data because
>> the block layer does it before allocating out requests e.g. via
>> blk_get_request().
>>
>> So fix by not clearing it.
>>
>> Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion")
>> Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>
>> ---
>>   drivers/scsi/ufs/ufshcd.c | 5 -----
>>   1 file changed, 5 deletions(-)
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 95be7ecdfe10..f34b3994d1aa 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>>       err = wait_for_completion_io_timeout(&wait,
>>               msecs_to_jiffies(TM_CMD_TIMEOUT));
>>       if (!err) {
>> -        /*
>> -         * Make sure that ufshcd_compl_tm() does not trigger a
>> -         * use-after-free.
>> -         */
>> -        req->end_io_data = NULL;
>>           ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>>           dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
>>                   __func__, tm_function);
> 
> With this patch applied ufshcd_tmc_handler() can trigger a use-after-free of the stack memory used for the 'wait' completion.

AFAICT that would only happen because blk_mq_tagset_busy_iter() is not synchronized with respect to blk_put_request().
But ufshcd_tmc_handler() does not use blk_mq_tagset_busy_iter() anymore, so that can't happen.

Wouldn't it be better to keep the code that clears req->end_io_data and to change complete(c) into if(c) complete(c) in ufshcd_tmc_handler()?

If that were needed, it would imply the synchronization was broken i.e. why are we referencing a request that has already been through blk_put_request()?