Re: [PATCH for-next v6 08/12] RDMA/efa: Implement functions that submit and complete admin commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06-May-19 21:31, Jason Gunthorpe wrote:
> On Mon, May 06, 2019 at 04:51:00PM +0300, Gal Pressman wrote:
>>>>>>>> +static void efa_com_admin_flush(struct efa_com_dev *edev)
>>>>>>>> +{
>>>>>>>> +	struct efa_com_admin_queue *aq = &edev->aq;
>>>>>>>> +
>>>>>>>> +	clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state);
>>>>>>>
>>>>>>> This scheme looks use after free racey to me..
>>>>>>
>>>>>> The running bit stops new admin commands from being submitted, clearly the exact
>>>>>> moment in which the bit is cleared is "racy" to submission of admin commands but
>>>>>> that is taken care of.
>>>>>>
>>>>>> The submission of an admin command is atomic as it is guarded by an admin queue
>>>>>> lock.
>>>>>> The same lock is acquired by this flow as well when flushing the admin queue.
>>>>>> After all admin commands have been aborted and we know for sure that
>>>>>> no new
>>>>>
>>>>> The problem is the 'abort' does nothing to ensure parallel threads are
>>>>> no longer touching this memory, 
>>>>
>>>> Which memory? The user threads touch the user allocated buffers which are not
>>>> being freed on admin queue destroy.
>>>
>>> The memory the other thread is touching is freed a few lines below in
>>> a devm_kfree. The apparent purpose of this code is to make the other
>>> thread stop but does it wrong.
>>
>> Are we talking about the completion context/completion context pool?
>> The user thread does use this memory, but this is done while the avail_cmds
>> semaphore is down which means the wait_for_abort_completion function is still
>> waiting for this thread to finish.
> 
> We are talking about
> 
>      CPU 0                                          CPU 1
> efa_com_submit_admin_cmd()
>   	spin_lock(&aq->sq.lock);
> 
>                                          efa_remove_device()
>                                              efa_com_admin_destroy()
>                                                efa_com_admin_flush()
>                                                [..]
>                                           kfree(aq)
> 
> 

As long as efa_com_submit_admin_cmd() is running the semaphore is still "down"
which means the wait function will be blocked.

>  
> So, either there is no possible concurrency with the 'aq' users and
> device removal, in which case all the convoluted locking in
> efa_com_admin_flush() and related is unneeded
> 
> Or there is concurrency and it isn't being torn down properly, so we
> get the above race.
> 
> My read is that all the 'admin commands' are done off of verbs
> callbacks and ib_unregister_device is called before we get to
> efa_remove_device (guaranteeing there are no running verbs callbacks),
> so there is no possible concurrency and all this efa_com_admin_flush()
> and related is pointless obfuscation. Delete it.

You're right, the "abort" flow was overcautious as there shouldn't be any
pending threads after ib_unregister_device.
I will remove this flow.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux