Re: [PATCH for-next v6 08/12] RDMA/efa: Implement functions that submit and complete admin commands

Gal Pressman <galpress@xxxxxxxxxx> · Wed, 8 May 2019 09:00:17 +0300

On 07-May-19 15:51, Jason Gunthorpe wrote:
> On Tue, May 07, 2019 at 03:38:21PM +0300, Gal Pressman wrote:
>> On 06-May-19 21:31, Jason Gunthorpe wrote:
>>> On Mon, May 06, 2019 at 04:51:00PM +0300, Gal Pressman wrote:
>>>>>>>>>> +static void efa_com_admin_flush(struct efa_com_dev *edev)
>>>>>>>>>> +{
>>>>>>>>>> +	struct efa_com_admin_queue *aq = &edev->aq;
>>>>>>>>>> +
>>>>>>>>>> +	clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state);
>>>>>>>>>
>>>>>>>>> This scheme looks use after free racey to me..
>>>>>>>>
>>>>>>>> The running bit stops new admin commands from being submitted, clearly the exact
>>>>>>>> moment in which the bit is cleared is "racy" to submission of admin commands but
>>>>>>>> that is taken care of.
>>>>>>>>
>>>>>>>> The submission of an admin command is atomic as it is guarded by an admin queue
>>>>>>>> lock.
>>>>>>>> The same lock is acquired by this flow as well when flushing the admin queue.
>>>>>>>> After all admin commands have been aborted and we know for sure that
>>>>>>>> no new
>>>>>>>
>>>>>>> The problem is the 'abort' does nothing to ensure parallel threads are
>>>>>>> no longer touching this memory, 
>>>>>>
>>>>>> Which memory? The user threads touch the user allocated buffers which are not
>>>>>> being freed on admin queue destroy.
>>>>>
>>>>> The memory the other thread is touching is freed a few lines below in
>>>>> a devm_kfree. The apparent purpose of this code is to make the other
>>>>> thread stop but does it wrong.
>>>>
>>>> Are we talking about the completion context/completion context pool?
>>>> The user thread does use this memory, but this is done while the avail_cmds
>>>> semaphore is down which means the wait_for_abort_completion function is still
>>>> waiting for this thread to finish.
>>>
>>> We are talking about
>>>
>>>      CPU 0                                          CPU 1
>>> efa_com_submit_admin_cmd()
>>>   	spin_lock(&aq->sq.lock);
>>>
>>>                                          efa_remove_device()
>>>                                              efa_com_admin_destroy()
>>>                                                efa_com_admin_flush()
>>>                                                [..]
>>>                                           kfree(aq)
>>>
>>>
>>
>> As long as efa_com_submit_admin_cmd() is running the semaphore is still "down"
>> which means the wait function will be blocked.
> 
> It is a race, order it a little differently:
> 
>       CPU 0                                          CPU 1
>  efa_com_submit_admin_cmd()
>                                           efa_remove_device()
>                                               efa_com_admin_destroy()
>                                                 efa_com_admin_flush()
>                                                 efa_com_wait_for_abort_completion()
>                                                 [..]
>    	spin_lock(&aq->sq.lock);
>  
>                                            kfree(aq)
> 
> Fundamentally you can't use locking *inside* the memory you are trying
> to free to exclude other threads from using that memory. That is
> always a user after free.
> 
> Which is why when I see someone write something like:
> 
> 	spin_lock(&aq->sq.lock);
> 	if (!test_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state)) {
> 		ibdev_err(aq->efa_dev, "Admin queue is closed\n");
> 		spin_unlock(&aq->sq.lock);
> 
> it is almost always a bug
> 
> And when you see matching things like:
> 
> [..]
> 	set_bit(EFA_AQ_STATE_POLLING_BIT, &edev->aq.state);
>         kfree(edev)
> 
> You know it is screwed up in some way.

Thanks for the detailed explanation, makes sense.

> 
>>> So, either there is no possible concurrency with the 'aq' users and
>>> device removal, in which case all the convoluted locking in
>>> efa_com_admin_flush() and related is unneeded
>>>
>>> Or there is concurrency and it isn't being torn down properly, so we
>>> get the above race.
>>>
>>> My read is that all the 'admin commands' are done off of verbs
>>> callbacks and ib_unregister_device is called before we get to
>>> efa_remove_device (guaranteeing there are no running verbs callbacks),
>>> so there is no possible concurrency and all this efa_com_admin_flush()
>>> and related is pointless obfuscation. Delete it.
>>
>> You're right, the "abort" flow was overcautious as there shouldn't be any
>> pending threads after ib_unregister_device.
>> I will remove this flow.
> 
> Send a follow up patch

Will do.