On 07-May-19 15:51, Jason Gunthorpe wrote: > On Tue, May 07, 2019 at 03:38:21PM +0300, Gal Pressman wrote: >> On 06-May-19 21:31, Jason Gunthorpe wrote: >>> On Mon, May 06, 2019 at 04:51:00PM +0300, Gal Pressman wrote: >>>>>>>>>> +static void efa_com_admin_flush(struct efa_com_dev *edev) >>>>>>>>>> +{ >>>>>>>>>> + struct efa_com_admin_queue *aq = &edev->aq; >>>>>>>>>> + >>>>>>>>>> + clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state); >>>>>>>>> >>>>>>>>> This scheme looks use after free racey to me.. >>>>>>>> >>>>>>>> The running bit stops new admin commands from being submitted, clearly the exact >>>>>>>> moment in which the bit is cleared is "racy" to submission of admin commands but >>>>>>>> that is taken care of. >>>>>>>> >>>>>>>> The submission of an admin command is atomic as it is guarded by an admin queue >>>>>>>> lock. >>>>>>>> The same lock is acquired by this flow as well when flushing the admin queue. >>>>>>>> After all admin commands have been aborted and we know for sure that >>>>>>>> no new >>>>>>> >>>>>>> The problem is the 'abort' does nothing to ensure parallel threads are >>>>>>> no longer touching this memory, >>>>>> >>>>>> Which memory? The user threads touch the user allocated buffers which are not >>>>>> being freed on admin queue destroy. >>>>> >>>>> The memory the other thread is touching is freed a few lines below in >>>>> a devm_kfree. The apparent purpose of this code is to make the other >>>>> thread stop but does it wrong. >>>> >>>> Are we talking about the completion context/completion context pool? >>>> The user thread does use this memory, but this is done while the avail_cmds >>>> semaphore is down which means the wait_for_abort_completion function is still >>>> waiting for this thread to finish. >>> >>> We are talking about >>> >>> CPU 0 CPU 1 >>> efa_com_submit_admin_cmd() >>> spin_lock(&aq->sq.lock); >>> >>> efa_remove_device() >>> efa_com_admin_destroy() >>> efa_com_admin_flush() >>> [..] >>> kfree(aq) >>> >>> >> >> As long as efa_com_submit_admin_cmd() is running the semaphore is still "down" >> which means the wait function will be blocked. > > It is a race, order it a little differently: > > CPU 0 CPU 1 > efa_com_submit_admin_cmd() > efa_remove_device() > efa_com_admin_destroy() > efa_com_admin_flush() > efa_com_wait_for_abort_completion() > [..] > spin_lock(&aq->sq.lock); > > kfree(aq) > > Fundamentally you can't use locking *inside* the memory you are trying > to free to exclude other threads from using that memory. That is > always a user after free. > > Which is why when I see someone write something like: > > spin_lock(&aq->sq.lock); > if (!test_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state)) { > ibdev_err(aq->efa_dev, "Admin queue is closed\n"); > spin_unlock(&aq->sq.lock); > > it is almost always a bug > > And when you see matching things like: > > [..] > set_bit(EFA_AQ_STATE_POLLING_BIT, &edev->aq.state); > kfree(edev) > > You know it is screwed up in some way. Thanks for the detailed explanation, makes sense. > >>> So, either there is no possible concurrency with the 'aq' users and >>> device removal, in which case all the convoluted locking in >>> efa_com_admin_flush() and related is unneeded >>> >>> Or there is concurrency and it isn't being torn down properly, so we >>> get the above race. >>> >>> My read is that all the 'admin commands' are done off of verbs >>> callbacks and ib_unregister_device is called before we get to >>> efa_remove_device (guaranteeing there are no running verbs callbacks), >>> so there is no possible concurrency and all this efa_com_admin_flush() >>> and related is pointless obfuscation. Delete it. >> >> You're right, the "abort" flow was overcautious as there shouldn't be any >> pending threads after ib_unregister_device. >> I will remove this flow. > > Send a follow up patch Will do.