Re: possible uverbs bug with comp events?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Jason,

On 8/22/2018 3:39 PM, Steve Wise wrote:
> 
> 
> On 8/22/2018 2:47 PM, Jason Gunthorpe wrote:
>> On Wed, Aug 22, 2018 at 02:03:47PM -0500, Steve Wise wrote:
>>>
>>>
>>> On 8/22/2018 1:49 PM, Jason Gunthorpe wrote:
>>>> On Tue, Aug 21, 2018 at 04:01:09PM -0500, Steve Wise wrote:
>>>>> Hey RDMAers,
>>>>>
>>>>> I see that a uverbs completion event is added to the ev_queue event_list
>>>>> as well as the associated uobject comp_list in ib_uverbs_comp_handler():
>>>>>
>>>>>         list_add_tail(&entry->list, &ev_queue->event_list);
>>>>>         list_add_tail(&entry->obj_list, &uobj->comp_list);
>>>>>         spin_unlock_irqrestore(&ev_queue->lock, flags);
>>>>>
>>>>>
>>>>> But in ib_uverbs_comp_event_close(), it looks like the entry could be
>>>>> left on the uobj comp_list and then the event structure is freed!
>>>>
>>>> I don't think we can get entries on the ev_queue for the
>>>> uverbs_event_fops FD that do not have counter set.
>>>>
>>>> It should be changed, it is confusing, but the change should be to
>>>> delete this:
>>>>
>>>> @@ -436,8 +436,7 @@ static int ib_uverbs_comp_event_close(struct inode *inode, struct file *filp)
>>>>  
>>>>         spin_lock_irq(&file->ev_queue.lock);
>>>>         list_for_each_entry_safe(entry, tmp, &file->ev_queue.event_list, list) {
>>>> -               if (entry->counter)
>>>> -                       list_del(&entry->obj_list);
>>>> +               list_del(&entry->obj_list);
>>>>                 kfree(entry);
>>>>         }
>>>>
>>>> The only events that should be pushed there are completion events, via
>>>> ib_uverbs_comp_handler and they always have non-NULL counter.
>>>>
>>>> Ie this in ib_uverbs_comp_handler:
>>>>
>>>> 	entry->counter		   = &uobj->comp_events_reported;
>>>>
>>>> Is never NULL, it is the address of a structure member.
>>>>
>>>> Jason
>>>>
>>>
>>> Thanks Jason,
>>>
>>> This then leaves me with the mystery of why I see "duplicate add"
>>> warnings from list_add_tail() called from ib_uverbs_comp_handler().  I
>>> wonder what other condition/state can cause this?
>>
>> Since ib_uverbs_comp_handler() uses fresh kmalloc memory there are
>> these cases I can see:
>>
>> 1 - The ib_uverbs_event was freed, but left on a list, then recylced by
>>     kmalloc
>> 2 - The memory backing the ev_queue->event_list was freed before/during
>>     ib_uverbs_comp_handler
>> 3 - ev_queue->event_list is manipulated without locking
>> 4 - The memory backing uobj->comp_list was freed before/during
>>     ib_uverbs_comp_handler
>> 5 - uobj->comp_list is manipulated without locking
>>
>> #4 is prevented by ib_uverbs_cq_event_handler() not being called
>> after ib_destroy_cq(). This is very important, there is a bad bug if
>> this is not guaranteed
>>
>> #2 is prevented by #4, and holidng the uobject reference for the comp
>> channel like this. ie we cannot enter ib_uverbs_cq_event_handler()
>> without a guarentee that context_args is a held reference.
>>
>> static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
>> [..]
>> 	ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel, file);
>> 	cq->event_handler = ib_uverbs_cq_event_handler;
>> 	cq->cq_context    = ev_file ? &ev_file->xv_queue : NULL;
>>
>> Which has a matching put here:
>>
>> void ib_uverbs_release_ucq(struct ib_uverbs_file *file,
>> 			  struct ib_uverbs_completion_event_file *ev_file,
>> 			  struct ib_ucq_object *uobj)
>> [..]
>> 		uverbs_uobject_put(&ev_file->uobj);
>>
>> #5 is easy to check, the locking looks fine.
>>
>> That leaves 1 and 3 to check..
>>
>> I see there are calls to comp_handler in the drivers. My best guess is
>> some driver has messed this up and is calling comp_handler after the
>> cq is destroyed?
> 
> Maybe. The path that tickles this [1] is in iw_cxgb4.  But the QP is
> being moved out of RTS, so the CQs should still be allocated.
> 
>>
>> Are you running with slab poisoning or kasan on?
> 
> I'll enable these.  Thank you for the detailed analysis!
> 

I've found two separate bugs:

1) iw_cxgb4:flush_qp() must enforce that the c4iw_qp.wq is only flushed
once, upon entering ERROR state.  Currently it doesn't do this for user
mode queues.  This can cause a touch-after-free crash.

2) the "double add" list_add_tail() warning still remains with the fix
for #1.  And I found out why.  I forced a BUG() when the double add
condition happens.

One CPU is here spinning trying to acquire the event queue spinlock:

 #5 [ffffaa0fe6aafd50] native_queued_spin_lock_slowpath at ffffffff8aeda512
 #6 [ffffaa0fe6aafd50] queued_spin_lock_slowpath at ffffffff8aeda1a0
 #7 [ffffaa0fe6aafd58] uverbs_hot_unplug_completion_event_file at
ffffffffc0a5743b [ib_uverbs]
 #8 [ffffaa0fe6aafd78] remove_commit_fd_uobject at ffffffffc0a56597
[ib_uverbs]
 #9 [ffffaa0fe6aafd98] _rdma_remove_commit_uobject at ffffffffc0a567cf
[ib_uverbs]
#10 [ffffaa0fe6aafdc0] uverbs_close_fd at ffffffffc0a56e7f [ib_uverbs]
#11 [ffffaa0fe6aafdf0] ib_uverbs_comp_event_close at ffffffffc0a4daef
[ib_uverbs]
#12 [ffffaa0fe6aafe20] __fput at ffffffff8b043f05
#13 [ffffaa0fe6aafe60] task_work_run at ffffffff8aea8bf2
#14 [ffffaa0fe6aafe90] do_exit at ffffffff8ae8f7cd
#15 [ffffaa0fe6aaff00] do_group_exit at ffffffff8ae900b9
#16 [ffffaa0fe6aaff30] do_syscall_64 at ffffffff8ae0381e
#17 [ffffaa0fe6aaff50] entry_SYSCALL_64_after_hwframe at ffffffff8b600081

In ib_uverbs_comp_event_close() however, all the ib_uverb_event entries
were removed from the event queue event list here:

        spin_lock_irq(&file->ev_queue.lock);
        list_for_each_entry_safe(entry, tmp, &file->ev_queue.event_list,
list) {
                if (entry->counter)
                        list_del(&entry->obj_list);
                kfree(entry);
        }
        spin_unlock_irq(&file->ev_queue.lock);

Notice that each entry is removed from the uobj_list but it is NOT
removed from the event list, but rather it is just kfree()d.

So immediately after this happens (when the spin lock is released),
another CPU tries to insert a new event because the still allocated qp
is moving to ERROR state as part of a connection abort by the peer:

    [exception RIP: __list_add_valid+75]
    RIP: ffffffff8b1983eb  RSP: ffffaa0fe4fcfc50  RFLAGS: 00010046
    RAX: 0000000000000058  RBX: ffff9c6148fef6d8  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff9c64572d69b8  RDI: ffff9c64572d69b8
    RBP: ffff9c635a62bf80   R8: 0000000000000000   R9: 00000000000006ad
    R10: 00000000000003ff  R11: 0000000000aaaaaa  R12: ffff9c63ad578680
    R13: 0000000000000086  R14: ffff9c635a62bf90  R15: ffff9c635a62bf90
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffaa0fe4fcfc50] ib_uverbs_comp_handler at ffffffffc0a4eb7e
[ib_uverbs]
 #8 [ffffaa0fe4fcfc90] flush_qp at ffffffffc0aa4b76 [iw_cxgb4]
 #9 [ffffaa0fe4fcfcb8] c4iw_modify_rc_qp at ffffffffc0aacdf5 [iw_cxgb4]
#10 [ffffaa0fe4fcfda8] peer_abort at ffffffffc0a93d67 [iw_cxgb4]
#11 [ffffaa0fe4fcfe60] process_work at ffffffffc0a961d6 [iw_cxgb4]
#12 [ffffaa0fe4fcfe78] process_one_work at ffffffff8aea3ecf
#13 [ffffaa0fe4fcfeb8] worker_thread at ffffffff8aea4b87
#14 [ffffaa0fe4fcff10] kthread at ffffffff8aeaa2ec
#15 [ffffaa0fe4fcff50] ret_from_fork at ffffffff8b600205

So ib_uverbs_comp_handler() kmalloc()s the ib_uverbs_event struct and
gets the same memory just kfree()d by the other CPU.  This causes the
list_add_tail() warning because the event struct was never removed from
the event list via a list_del() call.

But looking at uverbs_hot_unplug_completion_event_file(), I see it
acquires the event_queue spinlock just to set ev_queue->is_closed = 1.

Perhaps setting is_closed to 1 should have been done in
ib_uverbs_comp_event_close()?

Maybe this is the fix?

diff --git a/drivers/infiniband/core/uverbs_main.c
b/drivers/infiniband/core/uverbs_main.c
index 0f70ff91276e..bd7c6bdcfee3 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -424,6 +424,7 @@ static int ib_uverbs_comp_event_close(struct inode
*inode, struct file *filp)
                        list_del(&entry->obj_list);
                kfree(entry);
        }
+       ev_queue->is_closed = 1;
        spin_unlock_irq(&file->ev_queue.lock);

        uverbs_close_fd(filp);
diff --git a/drivers/infiniband/core/uverbs_std_types.c
b/drivers/infiniband/core/uverbs_std_types.c
index 5f9321eda1b7..531e43123630 100644
--- a/drivers/infiniband/core/uverbs_std_types.c
+++ b/drivers/infiniband/core/uverbs_std_types.c
@@ -198,10 +198,6 @@ static int
uverbs_hot_unplug_completion_event_file(struct ib_uobject_file *uobj_
                             uobj_file);
        struct ib_uverbs_event_queue *event_queue =
&comp_event_file->ev_queue;

-       spin_lock_irq(&event_queue->lock);
-       event_queue->is_closed = 1;
-       spin_unlock_irq(&event_queue->lock);
-
        if (why == RDMA_REMOVE_DRIVER_REMOVE) {
                wake_up_interruptible(&event_queue->poll_wait);
                kill_fasync(&event_queue->async_queue, SIGIO, POLL_IN);







[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux