Hey Jason, On 8/22/2018 3:39 PM, Steve Wise wrote: > > > On 8/22/2018 2:47 PM, Jason Gunthorpe wrote: >> On Wed, Aug 22, 2018 at 02:03:47PM -0500, Steve Wise wrote: >>> >>> >>> On 8/22/2018 1:49 PM, Jason Gunthorpe wrote: >>>> On Tue, Aug 21, 2018 at 04:01:09PM -0500, Steve Wise wrote: >>>>> Hey RDMAers, >>>>> >>>>> I see that a uverbs completion event is added to the ev_queue event_list >>>>> as well as the associated uobject comp_list in ib_uverbs_comp_handler(): >>>>> >>>>> list_add_tail(&entry->list, &ev_queue->event_list); >>>>> list_add_tail(&entry->obj_list, &uobj->comp_list); >>>>> spin_unlock_irqrestore(&ev_queue->lock, flags); >>>>> >>>>> >>>>> But in ib_uverbs_comp_event_close(), it looks like the entry could be >>>>> left on the uobj comp_list and then the event structure is freed! >>>> >>>> I don't think we can get entries on the ev_queue for the >>>> uverbs_event_fops FD that do not have counter set. >>>> >>>> It should be changed, it is confusing, but the change should be to >>>> delete this: >>>> >>>> @@ -436,8 +436,7 @@ static int ib_uverbs_comp_event_close(struct inode *inode, struct file *filp) >>>> >>>> spin_lock_irq(&file->ev_queue.lock); >>>> list_for_each_entry_safe(entry, tmp, &file->ev_queue.event_list, list) { >>>> - if (entry->counter) >>>> - list_del(&entry->obj_list); >>>> + list_del(&entry->obj_list); >>>> kfree(entry); >>>> } >>>> >>>> The only events that should be pushed there are completion events, via >>>> ib_uverbs_comp_handler and they always have non-NULL counter. >>>> >>>> Ie this in ib_uverbs_comp_handler: >>>> >>>> entry->counter = &uobj->comp_events_reported; >>>> >>>> Is never NULL, it is the address of a structure member. >>>> >>>> Jason >>>> >>> >>> Thanks Jason, >>> >>> This then leaves me with the mystery of why I see "duplicate add" >>> warnings from list_add_tail() called from ib_uverbs_comp_handler(). I >>> wonder what other condition/state can cause this? >> >> Since ib_uverbs_comp_handler() uses fresh kmalloc memory there are >> these cases I can see: >> >> 1 - The ib_uverbs_event was freed, but left on a list, then recylced by >> kmalloc >> 2 - The memory backing the ev_queue->event_list was freed before/during >> ib_uverbs_comp_handler >> 3 - ev_queue->event_list is manipulated without locking >> 4 - The memory backing uobj->comp_list was freed before/during >> ib_uverbs_comp_handler >> 5 - uobj->comp_list is manipulated without locking >> >> #4 is prevented by ib_uverbs_cq_event_handler() not being called >> after ib_destroy_cq(). This is very important, there is a bad bug if >> this is not guaranteed >> >> #2 is prevented by #4, and holidng the uobject reference for the comp >> channel like this. ie we cannot enter ib_uverbs_cq_event_handler() >> without a guarentee that context_args is a held reference. >> >> static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file, >> [..] >> ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel, file); >> cq->event_handler = ib_uverbs_cq_event_handler; >> cq->cq_context = ev_file ? &ev_file->xv_queue : NULL; >> >> Which has a matching put here: >> >> void ib_uverbs_release_ucq(struct ib_uverbs_file *file, >> struct ib_uverbs_completion_event_file *ev_file, >> struct ib_ucq_object *uobj) >> [..] >> uverbs_uobject_put(&ev_file->uobj); >> >> #5 is easy to check, the locking looks fine. >> >> That leaves 1 and 3 to check.. >> >> I see there are calls to comp_handler in the drivers. My best guess is >> some driver has messed this up and is calling comp_handler after the >> cq is destroyed? > > Maybe. The path that tickles this [1] is in iw_cxgb4. But the QP is > being moved out of RTS, so the CQs should still be allocated. > >> >> Are you running with slab poisoning or kasan on? > > I'll enable these. Thank you for the detailed analysis! > I've found two separate bugs: 1) iw_cxgb4:flush_qp() must enforce that the c4iw_qp.wq is only flushed once, upon entering ERROR state. Currently it doesn't do this for user mode queues. This can cause a touch-after-free crash. 2) the "double add" list_add_tail() warning still remains with the fix for #1. And I found out why. I forced a BUG() when the double add condition happens. One CPU is here spinning trying to acquire the event queue spinlock: #5 [ffffaa0fe6aafd50] native_queued_spin_lock_slowpath at ffffffff8aeda512 #6 [ffffaa0fe6aafd50] queued_spin_lock_slowpath at ffffffff8aeda1a0 #7 [ffffaa0fe6aafd58] uverbs_hot_unplug_completion_event_file at ffffffffc0a5743b [ib_uverbs] #8 [ffffaa0fe6aafd78] remove_commit_fd_uobject at ffffffffc0a56597 [ib_uverbs] #9 [ffffaa0fe6aafd98] _rdma_remove_commit_uobject at ffffffffc0a567cf [ib_uverbs] #10 [ffffaa0fe6aafdc0] uverbs_close_fd at ffffffffc0a56e7f [ib_uverbs] #11 [ffffaa0fe6aafdf0] ib_uverbs_comp_event_close at ffffffffc0a4daef [ib_uverbs] #12 [ffffaa0fe6aafe20] __fput at ffffffff8b043f05 #13 [ffffaa0fe6aafe60] task_work_run at ffffffff8aea8bf2 #14 [ffffaa0fe6aafe90] do_exit at ffffffff8ae8f7cd #15 [ffffaa0fe6aaff00] do_group_exit at ffffffff8ae900b9 #16 [ffffaa0fe6aaff30] do_syscall_64 at ffffffff8ae0381e #17 [ffffaa0fe6aaff50] entry_SYSCALL_64_after_hwframe at ffffffff8b600081 In ib_uverbs_comp_event_close() however, all the ib_uverb_event entries were removed from the event queue event list here: spin_lock_irq(&file->ev_queue.lock); list_for_each_entry_safe(entry, tmp, &file->ev_queue.event_list, list) { if (entry->counter) list_del(&entry->obj_list); kfree(entry); } spin_unlock_irq(&file->ev_queue.lock); Notice that each entry is removed from the uobj_list but it is NOT removed from the event list, but rather it is just kfree()d. So immediately after this happens (when the spin lock is released), another CPU tries to insert a new event because the still allocated qp is moving to ERROR state as part of a connection abort by the peer: [exception RIP: __list_add_valid+75] RIP: ffffffff8b1983eb RSP: ffffaa0fe4fcfc50 RFLAGS: 00010046 RAX: 0000000000000058 RBX: ffff9c6148fef6d8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9c64572d69b8 RDI: ffff9c64572d69b8 RBP: ffff9c635a62bf80 R8: 0000000000000000 R9: 00000000000006ad R10: 00000000000003ff R11: 0000000000aaaaaa R12: ffff9c63ad578680 R13: 0000000000000086 R14: ffff9c635a62bf90 R15: ffff9c635a62bf90 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffaa0fe4fcfc50] ib_uverbs_comp_handler at ffffffffc0a4eb7e [ib_uverbs] #8 [ffffaa0fe4fcfc90] flush_qp at ffffffffc0aa4b76 [iw_cxgb4] #9 [ffffaa0fe4fcfcb8] c4iw_modify_rc_qp at ffffffffc0aacdf5 [iw_cxgb4] #10 [ffffaa0fe4fcfda8] peer_abort at ffffffffc0a93d67 [iw_cxgb4] #11 [ffffaa0fe4fcfe60] process_work at ffffffffc0a961d6 [iw_cxgb4] #12 [ffffaa0fe4fcfe78] process_one_work at ffffffff8aea3ecf #13 [ffffaa0fe4fcfeb8] worker_thread at ffffffff8aea4b87 #14 [ffffaa0fe4fcff10] kthread at ffffffff8aeaa2ec #15 [ffffaa0fe4fcff50] ret_from_fork at ffffffff8b600205 So ib_uverbs_comp_handler() kmalloc()s the ib_uverbs_event struct and gets the same memory just kfree()d by the other CPU. This causes the list_add_tail() warning because the event struct was never removed from the event list via a list_del() call. But looking at uverbs_hot_unplug_completion_event_file(), I see it acquires the event_queue spinlock just to set ev_queue->is_closed = 1. Perhaps setting is_closed to 1 should have been done in ib_uverbs_comp_event_close()? Maybe this is the fix? diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 0f70ff91276e..bd7c6bdcfee3 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -424,6 +424,7 @@ static int ib_uverbs_comp_event_close(struct inode *inode, struct file *filp) list_del(&entry->obj_list); kfree(entry); } + ev_queue->is_closed = 1; spin_unlock_irq(&file->ev_queue.lock); uverbs_close_fd(filp); diff --git a/drivers/infiniband/core/uverbs_std_types.c b/drivers/infiniband/core/uverbs_std_types.c index 5f9321eda1b7..531e43123630 100644 --- a/drivers/infiniband/core/uverbs_std_types.c +++ b/drivers/infiniband/core/uverbs_std_types.c @@ -198,10 +198,6 @@ static int uverbs_hot_unplug_completion_event_file(struct ib_uobject_file *uobj_ uobj_file); struct ib_uverbs_event_queue *event_queue = &comp_event_file->ev_queue; - spin_lock_irq(&event_queue->lock); - event_queue->is_closed = 1; - spin_unlock_irq(&event_queue->lock); - if (why == RDMA_REMOVE_DRIVER_REMOVE) { wake_up_interruptible(&event_queue->poll_wait); kill_fasync(&event_queue->async_queue, SIGIO, POLL_IN);