Re: possible uverbs bug with comp events?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/22/2018 2:47 PM, Jason Gunthorpe wrote:
> On Wed, Aug 22, 2018 at 02:03:47PM -0500, Steve Wise wrote:
>>
>>
>> On 8/22/2018 1:49 PM, Jason Gunthorpe wrote:
>>> On Tue, Aug 21, 2018 at 04:01:09PM -0500, Steve Wise wrote:
>>>> Hey RDMAers,
>>>>
>>>> I see that a uverbs completion event is added to the ev_queue event_list
>>>> as well as the associated uobject comp_list in ib_uverbs_comp_handler():
>>>>
>>>>         list_add_tail(&entry->list, &ev_queue->event_list);
>>>>         list_add_tail(&entry->obj_list, &uobj->comp_list);
>>>>         spin_unlock_irqrestore(&ev_queue->lock, flags);
>>>>
>>>>
>>>> But in ib_uverbs_comp_event_close(), it looks like the entry could be
>>>> left on the uobj comp_list and then the event structure is freed!
>>>
>>> I don't think we can get entries on the ev_queue for the
>>> uverbs_event_fops FD that do not have counter set.
>>>
>>> It should be changed, it is confusing, but the change should be to
>>> delete this:
>>>
>>> @@ -436,8 +436,7 @@ static int ib_uverbs_comp_event_close(struct inode *inode, struct file *filp)
>>>  
>>>         spin_lock_irq(&file->ev_queue.lock);
>>>         list_for_each_entry_safe(entry, tmp, &file->ev_queue.event_list, list) {
>>> -               if (entry->counter)
>>> -                       list_del(&entry->obj_list);
>>> +               list_del(&entry->obj_list);
>>>                 kfree(entry);
>>>         }
>>>
>>> The only events that should be pushed there are completion events, via
>>> ib_uverbs_comp_handler and they always have non-NULL counter.
>>>
>>> Ie this in ib_uverbs_comp_handler:
>>>
>>> 	entry->counter		   = &uobj->comp_events_reported;
>>>
>>> Is never NULL, it is the address of a structure member.
>>>
>>> Jason
>>>
>>
>> Thanks Jason,
>>
>> This then leaves me with the mystery of why I see "duplicate add"
>> warnings from list_add_tail() called from ib_uverbs_comp_handler().  I
>> wonder what other condition/state can cause this?
> 
> Since ib_uverbs_comp_handler() uses fresh kmalloc memory there are
> these cases I can see:
> 
> 1 - The ib_uverbs_event was freed, but left on a list, then recylced by
>     kmalloc
> 2 - The memory backing the ev_queue->event_list was freed before/during
>     ib_uverbs_comp_handler
> 3 - ev_queue->event_list is manipulated without locking
> 4 - The memory backing uobj->comp_list was freed before/during
>     ib_uverbs_comp_handler
> 5 - uobj->comp_list is manipulated without locking
> 
> #4 is prevented by ib_uverbs_cq_event_handler() not being called
> after ib_destroy_cq(). This is very important, there is a bad bug if
> this is not guaranteed
> 
> #2 is prevented by #4, and holidng the uobject reference for the comp
> channel like this. ie we cannot enter ib_uverbs_cq_event_handler()
> without a guarentee that context_args is a held reference.
> 
> static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
> [..]
> 	ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel, file);
> 	cq->event_handler = ib_uverbs_cq_event_handler;
> 	cq->cq_context    = ev_file ? &ev_file->xv_queue : NULL;
> 
> Which has a matching put here:
> 
> void ib_uverbs_release_ucq(struct ib_uverbs_file *file,
> 			  struct ib_uverbs_completion_event_file *ev_file,
> 			  struct ib_ucq_object *uobj)
> [..]
> 		uverbs_uobject_put(&ev_file->uobj);
> 
> #5 is easy to check, the locking looks fine.
> 
> That leaves 1 and 3 to check..
> 
> I see there are calls to comp_handler in the drivers. My best guess is
> some driver has messed this up and is calling comp_handler after the
> cq is destroyed?

Maybe. The path that tickles this [1] is in iw_cxgb4.  But the QP is
being moved out of RTS, so the CQs should still be allocated.

> 
> Are you running with slab poisoning or kasan on?

I'll enable these.  Thank you for the detailed analysis!


Steve.

[1] - iw_cxgb4 received a peer close request which causes a flush
operation on the QP due to moving it from RTS->CLOSING.  Any WRs pending
in the QP are completed with "FLUSHED" status.  And the uvberbs
comp_handler is called which hits the "double add" warning.


Aug 21 09:45:02 nazca3 kernel: list_add double add:
new=ffff944ff02a2410, prev=ffff944ff02a2410, next=ffff9450495777c0.
Aug 21 09:45:02 nazca3 kernel: ------------[ cut here ]------------
Aug 21 09:45:02 nazca3 kernel: Call Trace:
Aug 21 09:45:02 nazca3 kernel: ib_uverbs_comp_handler+0x8e/0x150 [ib_uverbs]
Aug 21 09:45:02 nazca3 kernel: flush_qp+0x5a/0xb8 [iw_cxgb4]
Aug 21 09:45:02 nazca3 kernel: c4iw_modify_rc_qp+0x569/0xa1b [iw_cxgb4]
Aug 21 09:45:02 nazca3 kernel: peer_close+0x1f8/0x2fa [iw_cxgb4]
Aug 21 09:45:02 nazca3 kernel: ? ttwu_do_wakeup+0x19/0x140
Aug 21 09:45:02 nazca3 kernel: ? try_to_wake_up+0x54/0x450
Aug 21 09:45:02 nazca3 kernel: process_work+0x2d/0x51 [iw_cxgb4]
Aug 21 09:45:02 nazca3 kernel: process_one_work+0x14f/0x360
Aug 21 09:45:02 nazca3 kernel: worker_thread+0x47/0x3e0
Aug 21 09:45:02 nazca3 kernel: kthread+0xfc/0x130
Aug 21 09:45:02 nazca3 kernel: ? max_active_store+0x80/0x80
Aug 21 09:45:02 nazca3 kernel: ? __kthread_parkme+0x90/0x90
Aug 21 09:45:02 nazca3 kernel: ret_from_fork+0x35/0x40



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux