RE: possible uverbs bug with comp events?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason Gunthorpe <jgg@xxxxxxxx>
> Sent: Saturday, September 1, 2018 3:43 PM
> To: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>
> Cc: RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>
> Subject: Re: possible uverbs bug with comp events?
> 
> On Tue, Aug 28, 2018 at 02:52:41PM -0500, Steve Wise wrote:
> 
> > 1) iw_cxgb4:flush_qp() must enforce that the c4iw_qp.wq is only flushed
> > once, upon entering ERROR state.  Currently it doesn't do this for user
> > mode queues.  This can cause a touch-after-free crash.
> >
> > 2) the "double add" list_add_tail() warning still remains with the fix
> > for #1.  And I found out why.  I forced a BUG() when the double add
> > condition happens.
> >
> > One CPU is here spinning trying to acquire the event queue spinlock:
> >
> >  #5 [ffffaa0fe6aafd50] native_queued_spin_lock_slowpath at
> ffffffff8aeda512
> >  #6 [ffffaa0fe6aafd50] queued_spin_lock_slowpath at ffffffff8aeda1a0
> >  #7 [ffffaa0fe6aafd58] uverbs_hot_unplug_completion_event_file at
> > ffffffffc0a5743b [ib_uverbs]
> >  #8 [ffffaa0fe6aafd78] remove_commit_fd_uobject at ffffffffc0a56597
> > [ib_uverbs]
> >  #9 [ffffaa0fe6aafd98] _rdma_remove_commit_uobject at ffffffffc0a567cf
> > [ib_uverbs]
> > #10 [ffffaa0fe6aafdc0] uverbs_close_fd at ffffffffc0a56e7f [ib_uverbs]
> > #11 [ffffaa0fe6aafdf0] ib_uverbs_comp_event_close at ffffffffc0a4daef
> > [ib_uverbs]
> > #12 [ffffaa0fe6aafe20] __fput at ffffffff8b043f05
> > #13 [ffffaa0fe6aafe60] task_work_run at ffffffff8aea8bf2
> > #14 [ffffaa0fe6aafe90] do_exit at ffffffff8ae8f7cd
> > #15 [ffffaa0fe6aaff00] do_group_exit at ffffffff8ae900b9
> > #16 [ffffaa0fe6aaff30] do_syscall_64 at ffffffff8ae0381e
> > #17 [ffffaa0fe6aaff50] entry_SYSCALL_64_after_hwframe at
> ffffffff8b600081
> >
> > In ib_uverbs_comp_event_close() however, all the ib_uverb_event entries
> > were removed from the event queue event list here:
> >
> >         spin_lock_irq(&file->ev_queue.lock);
> >         list_for_each_entry_safe(entry, tmp, &file->ev_queue.event_list,
> > list) {
> >                 if (entry->counter)
> >                         list_del(&entry->obj_list);
> >                 kfree(entry);
> >         }
> >         spin_unlock_irq(&file->ev_queue.lock);
> >
> > Notice that each entry is removed from the uobj_list but it is NOT
> > removed from the event list, but rather it is just kfree()d.
> >
> > So immediately after this happens (when the spin lock is released),
> > another CPU tries to insert a new event because the still allocated qp
> > is moving to ERROR state as part of a connection abort by the peer:
> 
> This is not permitted.
> 
> When ib_uverbs_comp_event_close() is called there can be no other
> concurrent user of the uobject.
> 
> If there is, it signals that someone is chasing a stale pointer or is
> not doing uobject put/get reference counting properly.
> 
> > But looking at uverbs_hot_unplug_completion_event_file(), I see it
> > acquires the event_queue spinlock just to set ev_queue->is_closed = 1.
> 
> This is marking the comp channel as 'disassociated' but not closed, so
> it can't be moved..
> 
> > Perhaps setting is_closed to 1 should have been done in
> > ib_uverbs_comp_event_close()?
> 
> No, nothing can use the comp channel by the time we get to close, so
> any reliance on is_closed is wrong refcounting.
> 
> Sounds like it is still what I said before, the cxgb4 driver is
> continuing to refer to a ib_qp's internal data after destroy has
> happened, which is strictly not allowed.
> 
> Jason

Thanks Jason, I don't see that in the dump, but I'll keep looking.






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux