Re: [PATCH for-next v6 08/12] RDMA/efa: Implement functions that submit and complete admin commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 06, 2019 at 04:51:00PM +0300, Gal Pressman wrote:
> >>>>>> +static void efa_com_admin_flush(struct efa_com_dev *edev)
> >>>>>> +{
> >>>>>> +	struct efa_com_admin_queue *aq = &edev->aq;
> >>>>>> +
> >>>>>> +	clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state);
> >>>>>
> >>>>> This scheme looks use after free racey to me..
> >>>>
> >>>> The running bit stops new admin commands from being submitted, clearly the exact
> >>>> moment in which the bit is cleared is "racy" to submission of admin commands but
> >>>> that is taken care of.
> >>>>
> >>>> The submission of an admin command is atomic as it is guarded by an admin queue
> >>>> lock.
> >>>> The same lock is acquired by this flow as well when flushing the admin queue.
> >>>> After all admin commands have been aborted and we know for sure that
> >>>> no new
> >>>
> >>> The problem is the 'abort' does nothing to ensure parallel threads are
> >>> no longer touching this memory, 
> >>
> >> Which memory? The user threads touch the user allocated buffers which are not
> >> being freed on admin queue destroy.
> > 
> > The memory the other thread is touching is freed a few lines below in
> > a devm_kfree. The apparent purpose of this code is to make the other
> > thread stop but does it wrong.
> 
> Are we talking about the completion context/completion context pool?
> The user thread does use this memory, but this is done while the avail_cmds
> semaphore is down which means the wait_for_abort_completion function is still
> waiting for this thread to finish.

We are talking about

     CPU 0                                          CPU 1
efa_com_submit_admin_cmd()
  	spin_lock(&aq->sq.lock);

                                         efa_remove_device()
                                             efa_com_admin_destroy()
                                               efa_com_admin_flush()
                                               [..]
                                          kfree(aq)


 
So, either there is no possible concurrency with the 'aq' users and
device removal, in which case all the convoluted locking in
efa_com_admin_flush() and related is unneeded

Or there is concurrency and it isn't being torn down properly, so we
get the above race.

My read is that all the 'admin commands' are done off of verbs
callbacks and ib_unregister_device is called before we get to
efa_remove_device (guaranteeing there are no running verbs callbacks),
so there is no possible concurrency and all this efa_com_admin_flush()
and related is pointless obfuscation. Delete it.

Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux