Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, I pressed send too early, just one more point I wanted to make,
listed below

On Tue, Mar 15, 2016 at 2:30 PM, Devesh Sharma
<devesh.sharma@xxxxxxxxxxxx> wrote:
> On Mon, Mar 14, 2016 at 11:18 PM, Jason Gunthorpe
> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote:
>> On Mon, Mar 14, 2016 at 11:10:33AM +0530, Devesh Sharma wrote:
>>> On Sun, Mar 13, 2016 at 2:15 AM, Jason Gunthorpe
>>> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote:
>>> >
>>> > On Sat, Mar 12, 2016 at 10:18:47AM -0500, Devesh Sharma wrote:
>>> > > CC: Yishai Hadas <yishaih@xxxxxxxxxxxx>
>>> >
>>> > I'm still pretty convinced this is wrong... But even still:
>>> >
>>> > > @@ -954,6 +955,17 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>>> > >       struct ib_uverbs_file *file = filp->private_data;
>>> > >       struct ib_uverbs_device *dev = file->device;
>>> > >       struct ib_ucontext *ucontext = NULL;
>>> > > +     struct ib_device *ib_dev;
>>> > > +     int srcu_key;
>>> > > +
>>> > > +     srcu_key = srcu_read_lock(&dev->disassociate_srcu);
>>> > > +     ib_dev = srcu_dereference(dev->ib_dev,
>>> > > +                               &dev->disassociate_srcu);
>>> > > +     if (!ib_dev) {
>>> > > +             srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
>>> > > +             wait_for_completion(&file->fcomp);
>>> > > +             goto out;
>>> >
>>> > This jumps over this:
>>>
>>> I am not sure, to my mind kref_put(...,ib_uverbs_release_file) should
>>> also be skipped.
>>
>> I am talking about ib_uverbs_release_event_file
>
> And I am adding ib_uverbs_release_file() as well. the other thread is already
> cleaning it up and if ib_uverbs_close() reads it NULL then why to put krefs?
>
>>
>>> Because, by putting an wait_for_completion(), this context is
>>> effectively waiting for ib_uverbs_cleanup_ucontext to finish
>>> cleaning up this file pointer. if the other thread is cleaning up,
>>> why do I need to put the kerf again?
>>
>> The other context doesn't do a balancing kref_put(..,ib_uverbs_release_event_file).
>
> Actually it does, in the very next while loop on event_file list.
>
>>
>>> > As I've said, I'm not sure how this is any different from using
>>> > lists_mutex. The wait_for_completion will still block and deadlock if
>>> > ib_uverbs_close is entered during the disassociate flow.
>>>
>>> Taking list-mutex is not stopping ib_dev pointer to become NULL, while
>>> if we split the mutex
>>
>> I don't think you understand the problem. ib_uverbs_device->ib_dev can
>> be NULL just fine. In fact, it is always NULL when
>> ib_uverbs_free_hw_resources calls ib_uverbs_cleanup_ucontext - that is
>> obviously fine (or if it isn't fine today, there is yet another bug).
>
> Okay, I meant it is being freed. before entring
> ib_uverb_free_hw_resource() it is obviously set to NULL
> which is well understood.
>
>>
>> The issue appears to be that ib_uverbs_free_hw_resources does not wait
>> for ib_uverbs_cleanup_ucontext to complete before it goes ahead and
>> wrecks the ib_dev, resulting in use-after-free/etc on copies of ib_dev
>> pointer that are used by the still running ib_uverbs_free_hw_resources.
>
> Exactly, that is the real problem. I have explained it in the description of V4.
>
>>
>>> and wait_for_completion(), then effectively we are trying to sync
>>> ib_uverb_close() and
>>> remove_one() in such a way that no ib_uverb_close context hit ib_dev = NULL
>>
>> No, that is the wrong problem statement and wrong solution.
>>
>> The solution is to strong fence ib_uverbs_cleanup_ucontext so that
>> ib_uverbs_free_hw_resources does not exit until it is completed,
>> either by its thread or in the close thread.
>>
>> I prefer a mutex, but perhaps there are other ways to build the
>> fence (eg uverbs_dev->refcount springs to mind)
>
> uverbs_dev->refcount is already there, we can choose to wait for
> ref_count to become zero
>
> static void ib_uverbs_remove_one(struct ib_device *device, void *client_data)
> .
> .
> .
> .
> .
>                 rcu_assign_pointer(uverbs_dev->ib_dev, NULL);
>                 ib_uverbs_free_hw_resources(uverbs_dev, device);
>                 wait_clients = 0;
>         }
>
> wait for zero here instead of dec_and_test, I think things will
> automatically fall in place after that

Off-course ib_uverb_close() will have to take the ref-count on entry and
drop on exit.

>
>         if (atomic_dec_and_test(&uverbs_dev->refcount))
>                 ib_uverbs_comp_dev(uverbs_dev);
>         if (wait_clients)
>                 wait_for_completion(&uverbs_dev->comp);
>         kobject_put(&uverbs_dev->kobj);
> }
>
>>
>>> Can you please explain how it can lead to a deadlock?
>>
>> Yishai's note outlines it:
>>
>>                 /* We must release the mutex before going ahead and calling
>>                  * disassociate_ucontext. disassociate_ucontext might end up
>>                  * indirectly calling uverbs_close, for example due to freeing
>>                  * the resources (e.g mmput).
>>
>> Meaning when we call ib_uverbs_cleanup_ucontext from within
>> ib_uverbs_free_hw_resources it may recurse down into ib_uverbs_close.
>>
>> If this happens, with your patch ib_uverbs_close will wait on the
>> completion in the same thread that is supposed to trigger it. That is
>> the same deadlock as would happen if the lists_mutex would be used.
>>
>> The last patch I sent is much closer to what is needed, it is
>> just completely wrong to try and use the srcu for this.
>>
>> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux