Re: [PATCH for-next v6 08/12] RDMA/efa: Implement functions that submit and complete admin commands

Jason Gunthorpe <jgg@xxxxxxxxxxxx> · Tue, 7 May 2019 12:51:51 +0000

On Tue, May 07, 2019 at 03:38:21PM +0300, Gal Pressman wrote:
> On 06-May-19 21:31, Jason Gunthorpe wrote:
> > On Mon, May 06, 2019 at 04:51:00PM +0300, Gal Pressman wrote:
> >>>>>>>> +static void efa_com_admin_flush(struct efa_com_dev *edev)
> >>>>>>>> +{
> >>>>>>>> +	struct efa_com_admin_queue *aq = &edev->aq;
> >>>>>>>> +
> >>>>>>>> +	clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state);
> >>>>>>>
> >>>>>>> This scheme looks use after free racey to me..
> >>>>>>
> >>>>>> The running bit stops new admin commands from being submitted, clearly the exact
> >>>>>> moment in which the bit is cleared is "racy" to submission of admin commands but
> >>>>>> that is taken care of.
> >>>>>>
> >>>>>> The submission of an admin command is atomic as it is guarded by an admin queue
> >>>>>> lock.
> >>>>>> The same lock is acquired by this flow as well when flushing the admin queue.
> >>>>>> After all admin commands have been aborted and we know for sure that
> >>>>>> no new
> >>>>>
> >>>>> The problem is the 'abort' does nothing to ensure parallel threads are
> >>>>> no longer touching this memory, 
> >>>>
> >>>> Which memory? The user threads touch the user allocated buffers which are not
> >>>> being freed on admin queue destroy.
> >>>
> >>> The memory the other thread is touching is freed a few lines below in
> >>> a devm_kfree. The apparent purpose of this code is to make the other
> >>> thread stop but does it wrong.
> >>
> >> Are we talking about the completion context/completion context pool?
> >> The user thread does use this memory, but this is done while the avail_cmds
> >> semaphore is down which means the wait_for_abort_completion function is still
> >> waiting for this thread to finish.
> > 
> > We are talking about
> > 
> >      CPU 0                                          CPU 1
> > efa_com_submit_admin_cmd()
> >   	spin_lock(&aq->sq.lock);
> > 
> >                                          efa_remove_device()
> >                                              efa_com_admin_destroy()
> >                                                efa_com_admin_flush()
> >                                                [..]
> >                                           kfree(aq)
> > 
> > 
> 
> As long as efa_com_submit_admin_cmd() is running the semaphore is still "down"
> which means the wait function will be blocked.

It is a race, order it a little differently:

      CPU 0                                          CPU 1
 efa_com_submit_admin_cmd()
                                          efa_remove_device()
                                              efa_com_admin_destroy()
                                                efa_com_admin_flush()
                                                efa_com_wait_for_abort_completion()
                                                [..]
   	spin_lock(&aq->sq.lock);

                                           kfree(aq)

Fundamentally you can't use locking *inside* the memory you are trying
to free to exclude other threads from using that memory. That is
always a user after free.

Which is why when I see someone write something like:

	spin_lock(&aq->sq.lock);
	if (!test_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state)) {
		ibdev_err(aq->efa_dev, "Admin queue is closed\n");
		spin_unlock(&aq->sq.lock);

it is almost always a bug

And when you see matching things like:

[..]
	set_bit(EFA_AQ_STATE_POLLING_BIT, &edev->aq.state);
        kfree(edev)

You know it is screwed up in some way.

> > So, either there is no possible concurrency with the 'aq' users and
> > device removal, in which case all the convoluted locking in
> > efa_com_admin_flush() and related is unneeded
> > 
> > Or there is concurrency and it isn't being torn down properly, so we
> > get the above race.
> > 
> > My read is that all the 'admin commands' are done off of verbs
> > callbacks and ib_unregister_device is called before we get to
> > efa_remove_device (guaranteeing there are no running verbs callbacks),
> > so there is no possible concurrency and all this efa_com_admin_flush()
> > and related is pointless obfuscation. Delete it.
> 
> You're right, the "abort" flow was overcautious as there shouldn't be any
> pending threads after ib_unregister_device.
> I will remove this flow.

Send a follow up patch

Jason