Re: [PATCH for-next v6 08/12] RDMA/efa: Implement functions that submit and complete admin commands

Gal Pressman <galpress@xxxxxxxxxx> · Mon, 6 May 2019 16:51:00 +0300

On 05-May-19 15:37, Jason Gunthorpe wrote:
> On Sun, May 05, 2019 at 11:06:04AM +0300, Gal Pressman wrote:
>> On 03-May-19 15:20, Jason Gunthorpe wrote:
>>> On Fri, May 03, 2019 at 12:37:13PM +0300, Gal Pressman wrote:
>>>> On 02-May-19 16:55, Jason Gunthorpe wrote:
>>>>> On Wed, May 01, 2019 at 01:48:20PM +0300, Gal Pressman wrote:
>>>>>
>>>>>> +static struct efa_comp_ctx *efa_com_submit_admin_cmd(struct efa_com_admin_queue *aq,
>>>>>> +						     struct efa_admin_aq_entry *cmd,
>>>>>> +						     size_t cmd_size_in_bytes,
>>>>>> +						     struct efa_admin_acq_entry *comp,
>>>>>> +						     size_t comp_size_in_bytes)
>>>>>> +{
>>>>>> +	struct efa_comp_ctx *comp_ctx;
>>>>>> +
>>>>>> +	spin_lock(&aq->sq.lock);
>>>>>> +	if (!test_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state)) {
>>>>>> +		ibdev_err(aq->efa_dev, "Admin queue is closed\n");
>>>>>> +		spin_unlock(&aq->sq.lock);
>>>>>> +		return ERR_PTR(-ENODEV);
>>>>>> +	}
>>>>>> +
>>>>>> +	comp_ctx = __efa_com_submit_admin_cmd(aq, cmd, cmd_size_in_bytes, comp,
>>>>>> +					      comp_size_in_bytes);
>>>>>> +	spin_unlock(&aq->sq.lock);
>>>>>> +	if (IS_ERR(comp_ctx))
>>>>>> +		clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state);
>>>>>> +
>>>>>> +	return comp_ctx;
>>>>>> +}
>>>>>
>>>>> [..]
>>>>>
>>>>>> +static void efa_com_admin_flush(struct efa_com_dev *edev)
>>>>>> +{
>>>>>> +	struct efa_com_admin_queue *aq = &edev->aq;
>>>>>> +
>>>>>> +	clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state);
>>>>>
>>>>> This scheme looks use after free racey to me..
>>>>
>>>> The running bit stops new admin commands from being submitted, clearly the exact
>>>> moment in which the bit is cleared is "racy" to submission of admin commands but
>>>> that is taken care of.
>>>>
>>>> The submission of an admin command is atomic as it is guarded by an admin queue
>>>> lock.
>>>> The same lock is acquired by this flow as well when flushing the admin queue.
>>>> After all admin commands have been aborted and we know for sure that
>>>> no new
>>>
>>> The problem is the 'abort' does nothing to ensure parallel threads are
>>> no longer touching this memory, 
>>
>> Which memory? The user threads touch the user allocated buffers which are not
>> being freed on admin queue destroy.
> 
> The memory the other thread is touching is freed a few lines below in
> a devm_kfree. The apparent purpose of this code is to make the other
> thread stop but does it wrong.

Are we talking about the completion context/completion context pool?
The user thread does use this memory, but this is done while the avail_cmds
semaphore is down which means the wait_for_abort_completion function is still
waiting for this thread to finish.