Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 14, 2016 at 11:18 PM, Jason Gunthorpe
<jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Mar 14, 2016 at 11:10:33AM +0530, Devesh Sharma wrote:
>> On Sun, Mar 13, 2016 at 2:15 AM, Jason Gunthorpe
>> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote:
>> >
>> > On Sat, Mar 12, 2016 at 10:18:47AM -0500, Devesh Sharma wrote:
>> > > CC: Yishai Hadas <yishaih@xxxxxxxxxxxx>
>> >
>> > I'm still pretty convinced this is wrong... But even still:
>> >
>> > > @@ -954,6 +955,17 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>> > >       struct ib_uverbs_file *file = filp->private_data;
>> > >       struct ib_uverbs_device *dev = file->device;
>> > >       struct ib_ucontext *ucontext = NULL;
>> > > +     struct ib_device *ib_dev;
>> > > +     int srcu_key;
>> > > +
>> > > +     srcu_key = srcu_read_lock(&dev->disassociate_srcu);
>> > > +     ib_dev = srcu_dereference(dev->ib_dev,
>> > > +                               &dev->disassociate_srcu);
>> > > +     if (!ib_dev) {
>> > > +             srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
>> > > +             wait_for_completion(&file->fcomp);
>> > > +             goto out;
>> >
>> > This jumps over this:
>>
>> I am not sure, to my mind kref_put(...,ib_uverbs_release_file) should
>> also be skipped.
>
> I am talking about ib_uverbs_release_event_file

And I am adding ib_uverbs_release_file() as well. the other thread is already
cleaning it up and if ib_uverbs_close() reads it NULL then why to put krefs?

>
>> Because, by putting an wait_for_completion(), this context is
>> effectively waiting for ib_uverbs_cleanup_ucontext to finish
>> cleaning up this file pointer. if the other thread is cleaning up,
>> why do I need to put the kerf again?
>
> The other context doesn't do a balancing kref_put(..,ib_uverbs_release_event_file).

Actually it does, in the very next while loop on event_file list.

>
>> > As I've said, I'm not sure how this is any different from using
>> > lists_mutex. The wait_for_completion will still block and deadlock if
>> > ib_uverbs_close is entered during the disassociate flow.
>>
>> Taking list-mutex is not stopping ib_dev pointer to become NULL, while
>> if we split the mutex
>
> I don't think you understand the problem. ib_uverbs_device->ib_dev can
> be NULL just fine. In fact, it is always NULL when
> ib_uverbs_free_hw_resources calls ib_uverbs_cleanup_ucontext - that is
> obviously fine (or if it isn't fine today, there is yet another bug).

Okay, I meant it is being freed. before entring
ib_uverb_free_hw_resource() it is obviously set to NULL
which is well understood.

>
> The issue appears to be that ib_uverbs_free_hw_resources does not wait
> for ib_uverbs_cleanup_ucontext to complete before it goes ahead and
> wrecks the ib_dev, resulting in use-after-free/etc on copies of ib_dev
> pointer that are used by the still running ib_uverbs_free_hw_resources.

Exactly, that is the real problem. I have explained it in the description of V4.

>
>> and wait_for_completion(), then effectively we are trying to sync
>> ib_uverb_close() and
>> remove_one() in such a way that no ib_uverb_close context hit ib_dev = NULL
>
> No, that is the wrong problem statement and wrong solution.
>
> The solution is to strong fence ib_uverbs_cleanup_ucontext so that
> ib_uverbs_free_hw_resources does not exit until it is completed,
> either by its thread or in the close thread.
>
> I prefer a mutex, but perhaps there are other ways to build the
> fence (eg uverbs_dev->refcount springs to mind)

uverbs_dev->refcount is already there, we can choose to wait for
ref_count to become zero

static void ib_uverbs_remove_one(struct ib_device *device, void *client_data)
.
.
.
.
.
                rcu_assign_pointer(uverbs_dev->ib_dev, NULL);
                ib_uverbs_free_hw_resources(uverbs_dev, device);
                wait_clients = 0;
        }

wait for zero here instead of dec_and_test, I think things will
automatically fall in place after that

        if (atomic_dec_and_test(&uverbs_dev->refcount))
                ib_uverbs_comp_dev(uverbs_dev);
        if (wait_clients)
                wait_for_completion(&uverbs_dev->comp);
        kobject_put(&uverbs_dev->kobj);
}

>
>> Can you please explain how it can lead to a deadlock?
>
> Yishai's note outlines it:
>
>                 /* We must release the mutex before going ahead and calling
>                  * disassociate_ucontext. disassociate_ucontext might end up
>                  * indirectly calling uverbs_close, for example due to freeing
>                  * the resources (e.g mmput).
>
> Meaning when we call ib_uverbs_cleanup_ucontext from within
> ib_uverbs_free_hw_resources it may recurse down into ib_uverbs_close.
>
> If this happens, with your patch ib_uverbs_close will wait on the
> completion in the same thread that is supposed to trigger it. That is
> the same deadlock as would happen if the lists_mutex would be used.
>
> The last patch I sent is much closer to what is needed, it is
> just completely wrong to try and use the srcu for this.
>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux