On 05-May-19 15:37, Jason Gunthorpe wrote: > On Sun, May 05, 2019 at 11:06:04AM +0300, Gal Pressman wrote: >> On 03-May-19 15:20, Jason Gunthorpe wrote: >>> On Fri, May 03, 2019 at 12:37:13PM +0300, Gal Pressman wrote: >>>> On 02-May-19 16:55, Jason Gunthorpe wrote: >>>>> On Wed, May 01, 2019 at 01:48:20PM +0300, Gal Pressman wrote: >>>>> >>>>>> +static struct efa_comp_ctx *efa_com_submit_admin_cmd(struct efa_com_admin_queue *aq, >>>>>> + struct efa_admin_aq_entry *cmd, >>>>>> + size_t cmd_size_in_bytes, >>>>>> + struct efa_admin_acq_entry *comp, >>>>>> + size_t comp_size_in_bytes) >>>>>> +{ >>>>>> + struct efa_comp_ctx *comp_ctx; >>>>>> + >>>>>> + spin_lock(&aq->sq.lock); >>>>>> + if (!test_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state)) { >>>>>> + ibdev_err(aq->efa_dev, "Admin queue is closed\n"); >>>>>> + spin_unlock(&aq->sq.lock); >>>>>> + return ERR_PTR(-ENODEV); >>>>>> + } >>>>>> + >>>>>> + comp_ctx = __efa_com_submit_admin_cmd(aq, cmd, cmd_size_in_bytes, comp, >>>>>> + comp_size_in_bytes); >>>>>> + spin_unlock(&aq->sq.lock); >>>>>> + if (IS_ERR(comp_ctx)) >>>>>> + clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state); >>>>>> + >>>>>> + return comp_ctx; >>>>>> +} >>>>> >>>>> [..] >>>>> >>>>>> +static void efa_com_admin_flush(struct efa_com_dev *edev) >>>>>> +{ >>>>>> + struct efa_com_admin_queue *aq = &edev->aq; >>>>>> + >>>>>> + clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state); >>>>> >>>>> This scheme looks use after free racey to me.. >>>> >>>> The running bit stops new admin commands from being submitted, clearly the exact >>>> moment in which the bit is cleared is "racy" to submission of admin commands but >>>> that is taken care of. >>>> >>>> The submission of an admin command is atomic as it is guarded by an admin queue >>>> lock. >>>> The same lock is acquired by this flow as well when flushing the admin queue. >>>> After all admin commands have been aborted and we know for sure that >>>> no new >>> >>> The problem is the 'abort' does nothing to ensure parallel threads are >>> no longer touching this memory, >> >> Which memory? The user threads touch the user allocated buffers which are not >> being freed on admin queue destroy. > > The memory the other thread is touching is freed a few lines below in > a devm_kfree. The apparent purpose of this code is to make the other > thread stop but does it wrong. Are we talking about the completion context/completion context pool? The user thread does use this memory, but this is done while the avail_cmds semaphore is down which means the wait_for_abort_completion function is still waiting for this thread to finish.