On Wed, Aug 22, 2018 at 02:03:47PM -0500, Steve Wise wrote: > > > On 8/22/2018 1:49 PM, Jason Gunthorpe wrote: > > On Tue, Aug 21, 2018 at 04:01:09PM -0500, Steve Wise wrote: > >> Hey RDMAers, > >> > >> I see that a uverbs completion event is added to the ev_queue event_list > >> as well as the associated uobject comp_list in ib_uverbs_comp_handler(): > >> > >> list_add_tail(&entry->list, &ev_queue->event_list); > >> list_add_tail(&entry->obj_list, &uobj->comp_list); > >> spin_unlock_irqrestore(&ev_queue->lock, flags); > >> > >> > >> But in ib_uverbs_comp_event_close(), it looks like the entry could be > >> left on the uobj comp_list and then the event structure is freed! > > > > I don't think we can get entries on the ev_queue for the > > uverbs_event_fops FD that do not have counter set. > > > > It should be changed, it is confusing, but the change should be to > > delete this: > > > > @@ -436,8 +436,7 @@ static int ib_uverbs_comp_event_close(struct inode *inode, struct file *filp) > > > > spin_lock_irq(&file->ev_queue.lock); > > list_for_each_entry_safe(entry, tmp, &file->ev_queue.event_list, list) { > > - if (entry->counter) > > - list_del(&entry->obj_list); > > + list_del(&entry->obj_list); > > kfree(entry); > > } > > > > The only events that should be pushed there are completion events, via > > ib_uverbs_comp_handler and they always have non-NULL counter. > > > > Ie this in ib_uverbs_comp_handler: > > > > entry->counter = &uobj->comp_events_reported; > > > > Is never NULL, it is the address of a structure member. > > > > Jason > > > > Thanks Jason, > > This then leaves me with the mystery of why I see "duplicate add" > warnings from list_add_tail() called from ib_uverbs_comp_handler(). I > wonder what other condition/state can cause this? Since ib_uverbs_comp_handler() uses fresh kmalloc memory there are these cases I can see: 1 - The ib_uverbs_event was freed, but left on a list, then recylced by kmalloc 2 - The memory backing the ev_queue->event_list was freed before/during ib_uverbs_comp_handler 3 - ev_queue->event_list is manipulated without locking 4 - The memory backing uobj->comp_list was freed before/during ib_uverbs_comp_handler 5 - uobj->comp_list is manipulated without locking #4 is prevented by ib_uverbs_cq_event_handler() not being called after ib_destroy_cq(). This is very important, there is a bad bug if this is not guaranteed #2 is prevented by #4, and holidng the uobject reference for the comp channel like this. ie we cannot enter ib_uverbs_cq_event_handler() without a guarentee that context_args is a held reference. static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file, [..] ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel, file); cq->event_handler = ib_uverbs_cq_event_handler; cq->cq_context = ev_file ? &ev_file->xv_queue : NULL; Which has a matching put here: void ib_uverbs_release_ucq(struct ib_uverbs_file *file, struct ib_uverbs_completion_event_file *ev_file, struct ib_ucq_object *uobj) [..] uverbs_uobject_put(&ev_file->uobj); #5 is easy to check, the locking looks fine. That leaves 1 and 3 to check.. I see there are calls to comp_handler in the drivers. My best guess is some driver has messed this up and is calling comp_handler after the cq is destroyed? Are you running with slab poisoning or kasan on? Jason