Re: [PATCH V1 for-next 6/7] IB/core: Add support for fd objects

Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> · Fri, 10 Feb 2017 14:03:50 -0700

On Wed, Feb 01, 2017 at 02:39:04PM +0200, Matan Barak wrote:
> The completion channel we use in verbs infrastructure is FD based.
> Previously, we had a separate way to manage this object. Since we
> strive for a single way to manage any kind of object in this
> infrastructure, we conceptually treat all objects as subclasses
> of ib_uobject.
> 
> This commit adds the necessary mechanism to support FD based objects
> like their IDR counterparts. FD objects release need to be synchronized
> with context release. We use the cleanup_mutex on the uverbs_file for
> that.
> 
> Signed-off-by: Matan Barak <matanb@xxxxxxxxxxxx>
> Reviewed-by: Yishai Hadas <yishaih@xxxxxxxxxxxx>
>  drivers/infiniband/core/rdma_core.c   | 157 +++++++++++++++++++++++++++++++++-
>  drivers/infiniband/core/rdma_core.h   |   7 ++
>  drivers/infiniband/core/uverbs.h      |   1 +
>  drivers/infiniband/core/uverbs_main.c |   4 +-
>  include/rdma/ib_verbs.h               |   6 ++
>  include/rdma/uverbs_types.h           |  16 ++++
>  6 files changed, 189 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index 7ce4d67..1d24f26 100644
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -160,6 +160,73 @@ static void uverbs_uobject_add(struct ib_uobject *uobject)
>  	mutex_unlock(&uobject->context->lock);
>  }
>  
> +static struct ib_uobject *alloc_begin_fd_uobject(const struct uverbs_obj_type *type,
> +						 struct ib_ucontext *ucontext)
> +{
> +	const struct uverbs_obj_fd_type *fd_type =
> +		container_of(type, struct uverbs_obj_fd_type, type);
> +	int new_fd;
> +	struct ib_uobject_file *uobj_file = NULL;
> +	struct file *filp;
> +
> +	new_fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (new_fd < 0)
> +		return ERR_PTR(new_fd);
> +
> +	uobj_file = kmalloc(fd_type->obj_size, GFP_KERNEL);
> +	if (!uobj_file) {
> +		put_unused_fd(new_fd);
> +		return ERR_PTR(-ENOMEM);
> +	}

Move to alloc_uobj, see prior patches

> +static struct ib_uobject *lookup_get_fd_uobject(const struct uverbs_obj_type *type,
> +						struct ib_ucontext *ucontext,
> +						int id, bool write)
> +{
> +	struct file *f;
> +	struct ib_uobject *uobject;
> +	const struct uverbs_obj_fd_type *fd_type =
> +		container_of(type, struct uverbs_obj_fd_type, type);
> +
> +	if (write)
> +		return ERR_PTR(-EOPNOTSUPP);
> +
> +	f = fget(id);
> +	if (!f)
> +		return ERR_PTR(-EBADF);
> +
> +	uobject = f->private_data;
> +	if (f->f_op != fd_type->fops ||
> +	    !uobject->context) {
> +		fput(f);
> +		return ERR_PTR(-EBADF);
> +	}

The read of uobject->context needs locking

> +static void alloc_commit_fd_uobject(struct ib_uobject *uobj)
> +{
> +	struct ib_uobject_file *uobj_file =
> +		container_of(uobj, struct ib_uobject_file, uobj);
> +
> +	kref_get(&uobj_file->ufile->ref);

Something has gone wrong with the krefs here..

kref_get needs to be done for anon_inode_getfile. That kref_get pairs
with the put in uverbs_close_fd.

> +	uverbs_uobject_add(&uobj_file->uobj);
> +	fd_install(uobj_file->uobj.id, uobj->object);
> +	/* This shouldn't be used anymore. Use the file object instead */
> +	uobj_file->uobj.id = 0;

Needs a put to pair with the kref_get done by alloc_uobj.

> +static void alloc_abort_fd_uobject(struct ib_uobject *uobj)
> +{
> +	struct ib_uobject_file *uobj_file =
> +		container_of(uobj, struct ib_uobject_file, uobj);
> +	struct file *filp = uobj->object;
> +
> +	/* Unsuccessful NEW */
> +	fput(filp);
> +	put_unused_fd(uobj_file->uobj.id);

And here is the bug that fixes: since fput(filp) will call
uverbs_close_fd it just put the last kref and free'd the struct.

> +static void destroy_commit_null_uobject(struct ib_uobject *uobj)
> +{
> +	WARN_ON(true);
> +}

So in the model I suggest this would be remove and the purpose is to
disconnect the driver and leave the FD 'detached'.

> +
>  const struct uverbs_obj_type_ops uverbs_idr_ops = {
>  	.alloc_begin = alloc_begin_idr_uobject,
>  	.lookup_get = lookup_get_idr_uobject,
> @@ -254,8 +363,15 @@ void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool device_removed)
>  
>  		/*
>  		 * This shouldn't run while executing other commands on this
> -		 * context, thus no lock is required.
> +		 * context. Thus, the only thing we should take care of is
> +		 * releasing a FD while traversing this list. The FD could be
> +		 * closed and released from the _release fop of this FD.
> +		 * In order to mitigate this, we add a lock.
> +		 * We take and release the lock per order traversal in order
> +		 * to let other threads (which might still use the FDs) chance
> +		 * to run.
>  		 */
> +		mutex_lock(&ucontext->lock);

I think you may as well move this lock to patch 2, reads better that way.

>  		list_for_each_entry_safe(obj, next_obj, &ucontext->uobjects,
>  					 list)
>  			if (obj->type->destroy_order == cur_order) {
> @@ -265,6 +381,7 @@ void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool device_removed)
>  				next_order = min(next_order,
>  						 obj->type->destroy_order);
>  			}
> +		mutex_unlock(&ucontext->lock);

We had lots of problems with this sort of scheme in the hotplug
series, is this OK like this? Feels potentially dangerous..

I feel like we need a rwsem for cleanup. Write lock the rwsem above
which will inhibit changes to the list and have all the remove paths
try_read the rwsem and if that fails then skip removing from the list
and doing the destroy because we know uverbs_cleanup_ucontext is
running and will do both.

That sounds like a much more self-contained and clearer mechanism
compared to where we are today.

> +void uverbs_close_fd(struct file *f)
> +{
> +	struct ib_uobject_file *uobj_file = f->private_data;
> +
> +	mutex_lock(&uobj_file->ufile->cleanup_mutex);

I don't get what cleanup_mutex is doing. Nothing in here seems like it
needs that lock...

> +	if (uobj_file->uobj.context) {
> +		mutex_lock(&uobj_file->uobj.context->lock);
> +		list_del(&uobj_file->uobj.list);
> +		mutex_unlock(&uobj_file->uobj.context->lock);

Again, I don't like this. If we remove something from the list then it
should go through the whole destruction cycle. Why do we skip
is_closed=1 on this path, for instance? Seems wrong.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html